ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now
The Alteryx Community has implemented a new single sign-on experience. If you are having issues with your account, please review the FAQ's.

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Parsing Text using RegEx

5 - Atom

I have a column with a category code followed by a category desc, however some use a " - " to separate them and others simply are separated by a space.




12345 - Red Apples S
34567 Blue Oranges

I am trying to separate into 2 columns. Category Code and Category Desc.

I am able to separate these using 2 RegEx tools replace and parse. However this make my workflow very slow. Is there a better, perhaps simpler way of doing this that does not impact performance too badly?

11 - Bolide
11 - Bolide

Hi @JasonDuRandt,


Not sure about performance at scale as i don't have data for this, but you can consider for splitting your data into 2 streams: one containing the hyphen character and one without, you could use 'Text To Columns' and split to 2x columns and cleanup up using 'Data Cleansing'.





#Excuse me, do you speak Alteryx?


Hi @JasonDuRandt,


what @rafalolbert suggested or you could avoid the split, just remove the '-' with the Data Cleansing tool and then use Text to Column to separate on space "\s" (plus a following Data Cleansing tool to remove leading spaces)


See attached.


Hope this helps!




12 - Quasar

Limited testing due to only two examples, but you could use one formula tool to create the fields you're looking for, still using RegEx.


Your first created field (Category Code) would be:



 And your second (Category Description) would be:




This may be quicker than two separate RegEx tools, but I haven't tested it. 

12 - Quasar

Hi @JasonDuRandt ,

Check this link   for  regular expression for  these 2 examples .

12345 - Red Apples S
34567 Blue Oranges   



1st group is numbers

2nd  group is  non capturing group which may  occur  zero or 1 time

3rd group is whatever follows previous group