Hi,
I'm trying to parse two columns from various text strings, I basically need the first column to have the 5/five and the second to have the year/years from the text strings below (each full string is anywhere from a sentence to a paragraph):
1)
have 5-year option
2)
of 2, 5 years option
3)
two five-year options
Solved! Go to Solution.
Is it always going to be 5|five, or will you have other numbers that you want to pull out?
Using the RegEx tool with 'Parse' as output method: (.*)(year.*
I would change that first expression to be
(\d+|one|two|three|four|five|six|seven|eight|nine|ten)
If you need to go higher you can add eleven-nineteen, then twenty, thirty, forty, etc. It's long and awkward, but I'm not seeing a better option with a cursory search.
Ah, yes, I may have misread what the result needs to be. If the first column needs to be only the # then this would be the only route that I could think of. If the 2nd column needs to be only years or year then change the 2nd group to (years|year)
This regex may be too brittle depending on your actual use case, but it works in this example:
(.*\W)(\w+)\W(years*)(.*)
RegExOut2 and RegExOut3 should be your number and the word year/years.
From there you can probably do some kind of conversion for the word "five".
This makes an assumption that the number of years always precedes the word "year", and that there is always a single non-alphanumeric character between the two. Depending on the data you are dealing with, you may need to modify this, or cleanse your data into this format in some way.
If the sentence starts with "5 year plan" in that case we have to use "?" in the first expression.
(.*\W)?(\w+)(\W)(year)
Below are the example results of without and with "?" option condition
Without "?" option condition:
With "?" option condition: