Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parse at carriage return/embedded newlines

jeremyblaney
7 - Meteor

Hi everyone,

 

Hoping someone can give me a hand with a RegEx expression.

 

In my Status Column I have embedded newlines, and each value is separated by two carriage returns.  I need to parse the Status Column into as many columns as needed (no more than five). Here's what it looks like:

 

RecordStatus
1

Status 1 (United States of America)

 

Status 2 (United States of America) 

2Status 1 (United States of America)
3

Status 2 (United States of America)

 

Status 3 (United States of America)

 

Status 4 (United States of America)

 

A few comments:

  • All values in the Status Column end with "(United States of America)".
  • The max number of values in a single cell is five

I used RegEx /n to remove the carriage returns. Then I thought I could do an additional RegEx to basically separate the values anytime the phrase "(United States of America)" popped up. That didn't work.

 

Any help writing the expression would be much appreciated!

8 REPLIES 8
Joe_Mako
12 - Quasar

With the tool Text to Columns, you can use \n as the deliminator and set it to skip empty, attached are a couple of uses:

 

text to columns.png

MarqueeCrew
20 - Arcturus
20 - Arcturus

Hey Sparty!

 

Let's go about this slowly.

 

Formula tool (create Status_EZ as vstring):

regex_replace([Status],"\n\n",'|')

This will replace your 2 new line delimiters with a PIPE.

 

Next you can use a Text to Columns tool.

  • Field to Split is: Status_EZ
  • Delimiter is: | (pipe)
  • Split to Columns:  5
  • Extra:  Leave in last field
  • Output Root:  Status_

I like this approach because it is easy to read/explain.

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
jdunkerley79
ACE Emeritus
ACE Emeritus

I would suggest using a Regex tool in Tokenise mode with the expression:

(.*?)\(United States of America\)\r?\n?\r?\n?

I think it should do what you want.

 

To explain what it does. It looks for blocks which have "(United States of America)" after then. The brackets in this need to be escaped hence the \( and \). It then also looks for following new lines (in either Unix or Windows format)., allowing for at most 2 of them. By marking the block as `(.*?)` it will be non-greedy (i.e. not join everything together!). The tokenize mode of the Regex tool is happy if there is a single marked block to use this as the item that is repeated.

 

Finally configure the regex tool to split to columns and set count to 5 columns.

 

Result looks like

 

2017-07-04_16-56-00.jpg

jeremyblaney
7 - Meteor

Absolutely easy to read! Thanks!

 

I'd like to become more self-sufficient using the ole RegEx tool. Seems like it packs a powerful punch! Any websites, books, resources you'd recommend?

Inactive User
Not applicable

you can refer to this website for additional information about regex syntax

 

http://www.boost.org/doc/libs/1_64_0/libs/regex/doc/html/boost_regex/syntax/basic_extended.html

 

jdunkerley79
ACE Emeritus
ACE Emeritus

I'd recommend the https://regex101.com/ site

 

It lets you play and does a decent job of explaining the meaning of what you have written

Treyson
13 - Pulsar
13 - Pulsar

Check this out @Shevans

Treyson Marks
Senior Analytics Engineer
cowatson
7 - Meteor

@jdunkerley79 this is really helpful but I am struggling.  In my case i have a 100 or so cells with the same format:

 

i have been playing with your formula, as well as regex101, but can't quit figure out how to get this into the two columns I need.  Any thoughts on what in your formula I need to change?  Thank you.

 

The first paragraph has many words, may have commas, and may be more than one sentence

 

The second paragraph is same format, but has what I need to extract.

Labels