I have a column with a category code followed by a category desc, however some use a " - " to separate them and others simply are separated by a space.
Example:
12345 - Red Apples S
34567 Blue Oranges
I am trying to separate into 2 columns. Category Code and Category Desc.
I am able to separate these using 2 RegEx tools replace and parse. However this make my workflow very slow. Is there a better, perhaps simpler way of doing this that does not impact performance too badly?
Solved! Go to Solution.
Hi @JasonDuRandt,
Not sure about performance at scale as i don't have data for this, but you can consider for splitting your data into 2 streams: one containing the hyphen character and one without, you could use 'Text To Columns' and split to 2x columns and cleanup up using 'Data Cleansing'.
Thanks,
Rafal
#Excuse me, do you speak Alteryx?
Hi @JasonDuRandt,
what @rafalolbert suggested or you could avoid the split, just remove the '-' with the Data Cleansing tool and then use Text to Column to separate on space "\s" (plus a following Data Cleansing tool to remove leading spaces)
See attached.
Hope this helps!
Giuseppe
Limited testing due to only two examples, but you could use one formula tool to create the fields you're looking for, still using RegEx.
Your first created field (Category Code) would be:
REGEX_Replace([Field1],"(\d+)(.*)","$1")
And your second (Category Description) would be:
REGEX_Replace([Field1],"(\d+\s-*\s*)(.*)","$2")
This may be quicker than two separate RegEx tools, but I haven't tested it.
Hi @JasonDuRandt ,
Check this link for regular expression for these 2 examples .
12345 - Red Apples S
34567 Blue Oranges
https://regex101.com/r/beerWL/1
(\d+)\s(?:-\s)?(.*)
1st group is numbers
2nd group is non capturing group which may occur zero or 1 time
3rd group is whatever follows previous group