Alteryx Designer Desktop Discussions

jeremyblaney · ‎07-04-2017

Hi everyone,

Hoping someone can give me a hand with a RegEx expression.

In my Status Column I have embedded newlines, and each value is separated by two carriage returns. I need to parse the Status Column into as many columns as needed (no more than five). Here's what it looks like:

Record	Status
1	Status 1 (United States of America) Status 2 (United States of America)
2	Status 1 (United States of America)
3	Status 2 (United States of America) Status 3 (United States of America) Status 4 (United States of America)

A few comments:

All values in the Status Column end with "(United States of America)".
The max number of values in a single cell is five

I used RegEx /n to remove the carriage returns. Then I thought I could do an additional RegEx to basically separate the values anytime the phrase "(United States of America)" popped up. That didn't work.

Any help writing the expression would be much appreciated!

Joe_Mako · ‎07-04-2017

With the tool Text to Columns, you can use \n as the deliminator and set it to skip empty, attached are a couple of uses:

MarqueeCrew · ‎07-04-2017

Hey Sparty!

Let's go about this slowly.

Formula tool (create Status_EZ as vstring):

regex_replace([Status],"\n\n",'|')

This will replace your 2 new line delimiters with a PIPE.

Next you can use a Text to Columns tool.

Field to Split is: Status_EZ
Delimiter is: | (pipe)
Split to Columns: 5
Extra: Leave in last field
Output Root: Status_

I like this approach because it is easy to read/explain.

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

jdunkerley79 · ‎07-04-2017

I would suggest using a Regex tool in Tokenise mode with the expression:

(.*?)\(United States of America\)\r?\n?\r?\n?

I think it should do what you want.

To explain what it does. It looks for blocks which have "(United States of America)" after then. The brackets in this need to be escaped hence the \( and \). It then also looks for following new lines (in either Unix or Windows format)., allowing for at most 2 of them. By marking the block as `(.*?)` it will be non-greedy (i.e. not join everything together!). The tokenize mode of the Regex tool is happy if there is a single marked block to use this as the item that is repeated.

Finally configure the regex tool to split to columns and set count to 5 columns.

Result looks like

jeremyblaney · ‎07-04-2017

Absolutely easy to read! Thanks!

I'd like to become more self-sufficient using the ole RegEx tool. Seems like it packs a powerful punch! Any websites, books, resources you'd recommend?

Inactive User · ‎07-04-2017

you can refer to this website for additional information about regex syntax

http://www.boost.org/doc/libs/1_64_0/libs/regex/doc/html/boost_regex/syntax/basic_extended.html

jdunkerley79 · ‎07-05-2017

I'd recommend the https://regex101.com/ site

It lets you play and does a decent job of explaining the meaning of what you have written

Treyson · ‎04-04-2018

Check this out @Shevans

Treyson Marks
Managing Partner
DCG Analytics

cowatson · ‎04-20-2020

@jdunkerley79 this is really helpful but I am struggling. In my case i have a 100 or so cells with the same format:

i have been playing with your formula, as well as regex101, but can't quit figure out how to get this into the two columns I need. Any thoughts on what in your formula I need to change? Thank you.

The first paragraph has many words, may have commas, and may be more than one sentence

The second paragraph is same format, but has what I need to extract.

Alteryx Designer Desktop Discussions

Parse at carriage return/embedded newlines

Re: Macro not Looping thru Files in Folder

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data