Alteryx Designer Desktop Discussions

timothysmith_NYFRB · ‎12-21-2021

I am using the text to columns tool with a colon (:) delimiter. There are several different descriptions in one (cell), but the issue is a result has the colon in the description. Is there a way to use the delimiter but not break out the result that also has the colon
I am thinking I may have to use find replace (word...colon with underscore...word), but I am concerned I may not always know the result that needs to be replaced

For example the cell has [Source ID: VRATES C:D] and {Account Code: GCBCAPSC:D0] and therefore produces 6 fields, but I really want the 4 (ignore the 2nd colon)

christine_assaad · ‎12-21-2021

Hi @timothysmith_NYFRB

I am not sure how the data looks like, but given your example, you can configure the tool to create only 2 columns, as below so you don't lose the other colon.

The other option is to use Regex like below.

^([^:]+) means one or more character without : --> this should be the header

: is literally for the : sign

(.*) anything else --> should be the value

Hope this helps. Cheers!

timothysmith_NYFRB · ‎12-21-2021

Thank you for the suggestion

Unfortunately, the source data often contains 10-15 (never know how many) different tags with several colons within it and I never know what position the colon I want to keep is located within the result

What I did, as I look for a more elegant solution, was to identify the exceptions and did a find/replace of the second colon with an underscore (for the colon within the result)

This is not ideal because it does not account for results I do not know about

Thank you again for the suggestion

MarqueeCrew · ‎12-21-2021

@timothysmith_NYFRB ,

here's a regular expression that will turn the first : into a |

regex_replace([stuff],"(.*?):(.*)",'$1|$2')

now you can parse on the pipe.

cheers,

mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

timothysmith_NYFRB · ‎12-21-2021

I appreciate the suggestion and I have tried it several ways but replacing the 1st colon with a bar only seems to replace the 1st colon as expected and leaves the second colon alone (as expected)

The issue is there are many more components that are separated by a legitimate colon (the regex expression leaves them alone) which means when I parse again, I have my original issue of breaking up the result by the 2nd colon

I tried by adding another regex expression (same as the 1st, but instead of a bar replace with an underscore (something I would keep as part of the result), but when I did that, it replaced (3rd colons) where I did not want them replaced because they are legitimate delimiters (not part of the result)

I tried to mess with the regex expression to replace the 2nd colon with an underscore (or bar for that matter) so that I could parse by colons after I resolved the 2nd colon problem, but the results are not working as I hoped

Finally, I tried to add a delimiter after the regex expression and go from there, but later on down the line, another errant colon appears

Since I never know where in the string the errant colon is and where the colons I want are, the regex expression helped with most, but not the one's later on down

Ben_H · ‎12-22-2021

Hi @timothysmith_NYFRB,

Can you mock up some example data that we can look at?

Regards,

Ben

timothysmith_NYFRB · ‎12-22-2021

I appreciate all the help, the regex expression provided works very well so long as it just deals with the first instance of two colons

What I did was used the regex expression provided, added a text to columns tool , leaving the extra characters in the last column and then added another regex expression on that new 'extra' characters column, added text to columns tool, etc.

That seems to work so long as I have enough repeating regex expressions and text to columns tools, but if I do not have enough, I have an issue

Plus this does not seem very efficient because I have to repeat the tools sequence so many times

I have attached a mock up source data file and the workflow I am using. My issue is with the 1st record, fields WT018-WT020. The spend category name is separated (because of the colon in the name) so the only way I know to resolve is to add another regex expression and text to columns tool, but then what happens if I get data that has 20 components in it and 19 is the one with 2 colons....

I really appreciate your help, it is fun to learn this stuff

Ben_H · ‎12-22-2021

Hi @timothysmith_NYFRB,

I made a slight modification to your process that I think might be getting somewhere.

The initial regex you did on new lines I've swapped for a text to rows tool.

Then removed the empty values, and applied the regex above. By doing text to rows you end up with one column to apply the regex to which looks to have worked out ok.

You can then cross-tab back into a wide format.

See attached example (you'll probably have to point it to the right input file)

Is that closer to your desired output?

Regards,

Ben

timothysmith_NYFRB · ‎12-22-2021

Genius, that text to column (split to rows) worked perfectly with the regex expression

Not only did that solve my issue of multiple sets of (regex) tools and never knowing if I had enough, but it also eliminated me using several transpose, sort, numbering, formulas, join, union, etc. tools

Using the tools as you configured them allowed me to go from 35 tools (and 1 manual text input tool, ugh) to get the data (and I likely would have to add more regex tool sets) to 20 tools (and no text input tool, yeah) to accomplish the same results AND also cut my workflow run-time in half

Thank you very much for the support, this amazes me

Alteryx Designer Desktop Discussions

Delimiter