community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE
SOLVED

Parse at carriage return/embedded newlines

Hi everyone,

 

Hoping someone can give me a hand with a RegEx expression.

 

In my Status Column I have embedded newlines, and each value is separated by two carriage returns.  I need to parse the Status Column into as many columns as needed (no more than five). Here's what it looks like:

 

RecordStatus
1

Status 1 (United States of America)

 

Status 2 (United States of America) 

2Status 1 (United States of America)
3

Status 2 (United States of America)

 

Status 3 (United States of America)

 

Status 4 (United States of America)

 

A few comments:

  • All values in the Status Column end with "(United States of America)".
  • The max number of values in a single cell is five

I used RegEx /n to remove the carriage returns. Then I thought I could do an additional RegEx to basically separate the values anytime the phrase "(United States of America)" popped up. That didn't work.

 

Any help writing the expression would be much appreciated!

Quasar
Quasar

With the tool Text to Columns, you can use \n as the deliminator and set it to skip empty, attached are a couple of uses:

 

text to columns.png

Alteryx Certified Partner
Alteryx Certified Partner

Hey Sparty!

 

Let's go about this slowly.

 

Formula tool (create Status_EZ as vstring):

regex_replace([Status],"\n\n",'|')

This will replace your 2 new line delimiters with a PIPE.

 

Next you can use a Text to Columns tool.

  • Field to Split is: Status_EZ
  • Delimiter is: | (pipe)
  • Split to Columns:  5
  • Extra:  Leave in last field
  • Output Root:  Status_

I like this approach because it is easy to read/explain.

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.

I would suggest using a Regex tool in Tokenise mode with the expression:

(.*?)\(United States of America\)\r?\n?\r?\n?

I think it should do what you want.

 

To explain what it does. It looks for blocks which have "(United States of America)" after then. The brackets in this need to be escaped hence the \( and \). It then also looks for following new lines (in either Unix or Windows format)., allowing for at most 2 of them. By marking the block as `(.*?)` it will be non-greedy (i.e. not join everything together!). The tokenize mode of the Regex tool is happy if there is a single marked block to use this as the item that is repeated.

 

Finally configure the regex tool to split to columns and set count to 5 columns.

 

Result looks like

 

2017-07-04_16-56-00.jpg

Absolutely easy to read! Thanks!

 

I'd like to become more self-sufficient using the ole RegEx tool. Seems like it packs a powerful punch! Any websites, books, resources you'd recommend?

Inactive User
Not applicable

you can refer to this website for additional information about regex syntax

 

http://www.boost.org/doc/libs/1_64_0/libs/regex/doc/html/boost_regex/syntax/basic_extended.html

 

I'd recommend the https://regex101.com/ site

 

It lets you play and does a decent job of explaining the meaning of what you have written

Highlighted
Quasar
Quasar

Check this out @Shevans

Labels