Parsing phrase from text string
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I'm trying to parse two columns from various text strings, I basically need the first column to have the 5/five and the second to have the year/years from the text strings below (each full string is anywhere from a sentence to a paragraph):
1)
have 5-year option
2)
of 2, 5 years option
3)
two five-year options
Solved! Go to Solution.
- Labels:
- Datasets
- Expression
- Parse
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Is it always going to be 5|five, or will you have other numbers that you want to pull out?
Stephen Ruhl
Principal Customer Support Engineer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Using the RegEx tool with 'Parse' as output method: (.*)(year.*
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I would change that first expression to be
(\d+|one|two|three|four|five|six|seven|eight|nine|ten)
If you need to go higher you can add eleven-nineteen, then twenty, thirty, forty, etc. It's long and awkward, but I'm not seeing a better option with a cursory search.
Stephen Ruhl
Principal Customer Support Engineer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Ah, yes, I may have misread what the result needs to be. If the first column needs to be only the # then this would be the only route that I could think of. If the 2nd column needs to be only years or year then change the 2nd group to (years|year)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This regex may be too brittle depending on your actual use case, but it works in this example:
(.*\W)(\w+)\W(years*)(.*)
RegExOut2 and RegExOut3 should be your number and the word year/years.
From there you can probably do some kind of conversion for the word "five".
This makes an assumption that the number of years always precedes the word "year", and that there is always a single non-alphanumeric character between the two. Depending on the data you are dealing with, you may need to modify this, or cleanse your data into this format in some way.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
If the sentence starts with "5 year plan" in that case we have to use "?" in the first expression.
(.*\W)?(\w+)(\W)(year)
Below are the example results of without and with "?" option condition
Without "?" option condition:
With "?" option condition:
