Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Understanding Regex the Alteryx Way

Bobbins
8 - Asteroid

Morning All,

I am sure some may wonder why I have not posted this in other non-Alteryx locations, after all, Regex is not unique to Alteryx, however, I try to keep posts linked to software I am using in the same place where possible because I am sure if I am asking, others may have similar questions in time to come.

So, to start off simply, I have been learning to parse they following two rows:

12,000 345,678 910,111,121
0 0 0

If i use the following code (RegxA for Reference)

 

 

([\d,]+)\s([\d,]+)\s([\d,]+)

 

 

Then we get this:

12,000345,678910,111,121
000

My understanding is that the () allows me to group each part the Regex comes along, the \s allow for splitting at the white spaces, \d matches any digit and then the , matches a the comma with the + telling Regex to keep repeating this until the \s breaks it. So far so good! (The alterative of, (RegxB for Reference) )

 

 

 (\d*\S*)\s(\d*\S*)\s(\d*\S*) 

 

 

 

also works, more on this later. But then we want to include words, so if we add:

Myrow 12,000 345,678 910,111,121
0 0 0

 then with RegexA works just fine, ignores the words. RegxB also works but instead gives

Myrow12,000345,678
000

Which as we can see is quite wrong!
Now, before we solve this, to add a bit of complexity, I have added an extra line

Test Piece
Myrow 12,000 345,678 910,111,121
0 0 0

Now if we run both, we get just the numbers for RegxA and using RegxB:

Myrow12,000345,678
000

 

What is highly inconsistent here is using Regx101 website, tells me that it should break the words down (despite \d being for only digits) and that the 0s are not picked up when Altreyx clearly shows that they are!

So if we just consider the text, then if we use the following (RegxC for reference)

 

 

 

 

(\D*\s\D*)

 

 

 

 

Then the \D goes for any none number with the * repeating until \s (For a space) followed by the \D) again.  Excellent, when used by itself.
But add this to RegxA example and we are back to ignoring the first line of text and now the last lines of zeros. Adding this to the RegxB gives me the second row of text and the numbers but still no words for the first row.

 

So i have a few questions:

 

1) How can you build a RegX code line when what you want to parse changes to sometimes have and have not the parts you need? (e.g. top row is all text but bottom row is all numbers, the only one that works is the one in between???) In short, why is my code doing this when its suggested it shouldnt do?

2) Is there anyway to make the code smaller, e.g.Both RegxA and B examples have repeating groups, but is there no code for writing this once and repeating it?

3) Regx101 is great but appears not to handle things like this, any other places i could try to understand Regx better?


Thanks for reading this far!

3 REPLIES 3
DawnDuong
13 - Pulsar
13 - Pulsar

hi @Bobbins 

You rightly pointed out that the Regex logic is the same / similar across many software.

I feel it's perhaps helpful to clarify the context of where you most likely deploy Regex.

  1. The first place (more standardised one) is likely the REGEX TOOL (https://help.alteryx.com/20213/designer/regex-tool) which can be used under Parse, Match, Tokenise or Replace modes. Consider it as a more user-friendly interface.
  2. The second place is via the Formula Tool where you can use many available regex functions (part of available String functions https://help.alteryx.com/designer-cloud/string-functions) anyway you wish...

Once you know the environment, it is actually relatively straight forward to apply your prior knowledge on Regex to Alteryx environment.

Check out those interactive lessons related to Regex on this link to view the sample use cases / applications:

https://community.alteryx.com/t5/Interactive-Lessons/tkb-p/interactive-lessons/label-name/Parsing%20...

I also found these recorded videos very good:

https://community.alteryx.com/t5/Videos/Parsing-for-Intermediate-Users/td-p/66497

https://community.alteryx.com/t5/Videos/Working-with-Strings-in-Alteryx/td-p/43827

 

Cheers,

Dawn

Bobbins
8 - Asteroid

Thanks @DawnDuong , trying to use the Regex tool at present, going freestyle in the formula tool is just a pipedream at present and thanks for the links, I have seen the regex videos but i feel as if they need more examples rather then si

DawnDuong
13 - Pulsar
13 - Pulsar

hi @Bobbins 

I'd suggest that you check out the example of the Regex Tool (Right-click on the tool).

The example shows you how the 4 settings (Parse, Match, Tokenise, Replace) work differently - that may answer some of the questions you have earlier.

Once you are confident with using the Regex Tool, then using the free form one will come more easily.

Check out the Weekly challenges as well - those marked under "Data Parsing" typically require Regex.

Dawn.

Labels
Top Solution Authors