regular expression - match string
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello !
I'm trying to get all the match strings in a text using regular expression.
for example the text is :
1. Title 1
paragraph 1
2. Title 2
paragraph
I need to get all the titles (1. Title 1 , 2. Title 2)
I have wrote this regular expression : (\d\. )(.)*
and using this expression I'm getting the first title only. does anyone know how to get all the matches ?
- Labels:
- Regex
- Text Mining
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @raghadaf8!
Here's one way to do it:
(\d+\.\s\w+)
You can try it with the Tokenize option in the RegEx tool and split the result to rows.
Hope this helps!
Deb
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @raghadaf8, this can also be done using the text-to-column tool.
I am not actually sure what your use case is so I have tried to solve it by considering 2 scenarios.
1. In case the data is in separate rows.
If the data representation is as below
Then you can simply use a text-to-column tool to get the desired result.
2. In case the data is in a single row.
Then you can use the workflow attached in the screenshot.
For scenario 2 regex might have a limitation and may result only in the first title. (Not sure though)
I hope this helps!
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @raghadaf8
Another way of doing this. Very easy method by using Matched Group option. For writing regular expressions and setting hand on it, you can use Rubular editor. You can easily practice and learn regular experssions there.
Attaching the workflow for you reference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
hi ! thanks for the reply.
but that return the first word only, and i need it return the whole title
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi again @raghadaf8!
You can give this one a try:
(\d+\.\s.+)
\d+ = one or more digits
\. = period
\s = one space
.+ = one or more characters after the space (this will capture multi word titles)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
P.S. If that doesn't work, will you reply with a little more of your data (anonymized) so we can see the structure?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
that has returned the whole text starting from the first match, and all in one cell.
my data consist of numbered titles and each has a paragraph.
Example:
Document H
1. Title text 1
paragraph of title text 1
2. Title text 2
Paragraph of title text 2
end of document H
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @raghadaf8
Have you tried my solution? Also it would be better if you can share the sample data in excel exactly matching your requirement. It would help to understand better and provide the reliable solution.
Thanks
Rohit Gupta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Let's try this:
If this doesn't work, then please submit some sample data in excel as @Kurohits suggested so we can see where the breaks are in your data. We can all parse well! We are just guessing about your data structure :-)
