Hi everyone,
Hoping someone can give me a hand with a RegEx expression.
In my Status Column I have embedded newlines, and each value is separated by two carriage returns. I need to parse the Status Column into as many columns as needed (no more than five). Here's what it looks like:
Record | Status |
1 | Status 1 (United States of America)
Status 2 (United States of America) |
2 | Status 1 (United States of America) |
3 | Status 2 (United States of America)
Status 3 (United States of America)
Status 4 (United States of America) |
A few comments:
I used RegEx /n to remove the carriage returns. Then I thought I could do an additional RegEx to basically separate the values anytime the phrase "(United States of America)" popped up. That didn't work.
Any help writing the expression would be much appreciated!
Solved! Go to Solution.
Hey Sparty!
Let's go about this slowly.
Formula tool (create Status_EZ as vstring):
regex_replace([Status],"\n\n",'|')
This will replace your 2 new line delimiters with a PIPE.
Next you can use a Text to Columns tool.
I like this approach because it is easy to read/explain.
Cheers,
Mark
I would suggest using a Regex tool in Tokenise mode with the expression:
(.*?)\(United States of America\)\r?\n?\r?\n?
I think it should do what you want.
To explain what it does. It looks for blocks which have "(United States of America)" after then. The brackets in this need to be escaped hence the \( and \). It then also looks for following new lines (in either Unix or Windows format)., allowing for at most 2 of them. By marking the block as `(.*?)` it will be non-greedy (i.e. not join everything together!). The tokenize mode of the Regex tool is happy if there is a single marked block to use this as the item that is repeated.
Finally configure the regex tool to split to columns and set count to 5 columns.
Result looks like
Absolutely easy to read! Thanks!
I'd like to become more self-sufficient using the ole RegEx tool. Seems like it packs a powerful punch! Any websites, books, resources you'd recommend?
you can refer to this website for additional information about regex syntax
http://www.boost.org/doc/libs/1_64_0/libs/regex/doc/html/boost_regex/syntax/basic_extended.html
I'd recommend the https://regex101.com/ site
It lets you play and does a decent job of explaining the meaning of what you have written
Check this out @Shevans
@jdunkerley79 this is really helpful but I am struggling. In my case i have a 100 or so cells with the same format:
i have been playing with your formula, as well as regex101, but can't quit figure out how to get this into the two columns I need. Any thoughts on what in your formula I need to change? Thank you.
The first paragraph has many words, may have commas, and may be more than one sentence
The second paragraph is same format, but has what I need to extract. |