Morning All,
I am sure some may wonder why I have not posted this in other non-Alteryx locations, after all, Regex is not unique to Alteryx, however, I try to keep posts linked to software I am using in the same place where possible because I am sure if I am asking, others may have similar questions in time to come.
So, to start off simply, I have been learning to parse they following two rows:
12,000 345,678 910,111,121 |
0 0 0 |
If i use the following code (RegxA for Reference)
([\d,]+)\s([\d,]+)\s([\d,]+)
Then we get this:
12,000 | 345,678 | 910,111,121 |
0 | 0 | 0 |
My understanding is that the () allows me to group each part the Regex comes along, the \s allow for splitting at the white spaces, \d matches any digit and then the , matches a the comma with the + telling Regex to keep repeating this until the \s breaks it. So far so good! (The alterative of, (RegxB for Reference) )
(\d*\S*)\s(\d*\S*)\s(\d*\S*)
also works, more on this later. But then we want to include words, so if we add:
Myrow 12,000 345,678 910,111,121 |
0 0 0 |
then with RegexA works just fine, ignores the words. RegxB also works but instead gives
Myrow | 12,000 | 345,678 |
0 | 0 | 0 |
Which as we can see is quite wrong!
Now, before we solve this, to add a bit of complexity, I have added an extra line
Test Piece |
Myrow 12,000 345,678 910,111,121 |
0 0 0 |
Now if we run both, we get just the numbers for RegxA and using RegxB:
Myrow | 12,000 | 345,678 |
0 | 0 | 0 |
What is highly inconsistent here is using Regx101 website, tells me that it should break the words down (despite \d being for only digits) and that the 0s are not picked up when Altreyx clearly shows that they are!
So if we just consider the text, then if we use the following (RegxC for reference)
(\D*\s\D*)
Then the \D goes for any none number with the * repeating until \s (For a space) followed by the \D) again. Excellent, when used by itself.
But add this to RegxA example and we are back to ignoring the first line of text and now the last lines of zeros. Adding this to the RegxB gives me the second row of text and the numbers but still no words for the first row.
So i have a few questions:
1) How can you build a RegX code line when what you want to parse changes to sometimes have and have not the parts you need? (e.g. top row is all text but bottom row is all numbers, the only one that works is the one in between???) In short, why is my code doing this when its suggested it shouldnt do?
2) Is there anyway to make the code smaller, e.g.Both RegxA and B examples have repeating groups, but is there no code for writing this once and repeating it?
3) Regx101 is great but appears not to handle things like this, any other places i could try to understand Regx better?
Thanks for reading this far!
hi @Bobbins
You rightly pointed out that the Regex logic is the same / similar across many software.
I feel it's perhaps helpful to clarify the context of where you most likely deploy Regex.
Once you know the environment, it is actually relatively straight forward to apply your prior knowledge on Regex to Alteryx environment.
Check out those interactive lessons related to Regex on this link to view the sample use cases / applications:
I also found these recorded videos very good:
https://community.alteryx.com/t5/Videos/Parsing-for-Intermediate-Users/td-p/66497
https://community.alteryx.com/t5/Videos/Working-with-Strings-in-Alteryx/td-p/43827
Cheers,
Dawn
Thanks @DawnDuong , trying to use the Regex tool at present, going freestyle in the formula tool is just a pipedream at present and thanks for the links, I have seen the regex videos but i feel as if they need more examples rather then si
hi @Bobbins
I'd suggest that you check out the example of the Regex Tool (Right-click on the tool).
The example shows you how the 4 settings (Parse, Match, Tokenise, Replace) work differently - that may answer some of the questions you have earlier.
Once you are confident with using the Regex Tool, then using the free form one will come more easily.
Check out the Weekly challenges as well - those marked under "Data Parsing" typically require Regex.
Dawn.