Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing Text using RegEx

JasonDuRandt
5 - Atom

I have a column with a category code followed by a category desc, however some use a " - " to separate them and others simply are separated by a space.

 

Example:

 

12345 - Red Apples S
34567 Blue Oranges

I am trying to separate into 2 columns. Category Code and Category Desc.

I am able to separate these using 2 RegEx tools replace and parse. However this make my workflow very slow. Is there a better, perhaps simpler way of doing this that does not impact performance too badly?

4 REPLIES 4
rafalolbert
ACE Emeritus
ACE Emeritus

Hi @JasonDuRandt,

 

Not sure about performance at scale as i don't have data for this, but you can consider for splitting your data into 2 streams: one containing the hyphen character and one without, you could use 'Text To Columns' and split to 2x columns and cleanup up using 'Data Cleansing'.

 

Thanks,

Rafal

 

#Excuse me, do you speak Alteryx?

GiuseppeC
Alteryx
Alteryx

Hi @JasonDuRandt,

 

what @rafalolbert suggested or you could avoid the split, just remove the '-' with the Data Cleansing tool and then use Text to Column to separate on space "\s" (plus a following Data Cleansing tool to remove leading spaces)

 

See attached.

 

Hope this helps!

Giuseppe

 

 

neilgallen
12 - Quasar

Limited testing due to only two examples, but you could use one formula tool to create the fields you're looking for, still using RegEx.

 

Your first created field (Category Code) would be:

REGEX_Replace([Field1],"(\d+)(.*)","$1")

 

 And your second (Category Description) would be:

 

REGEX_Replace([Field1],"(\d+\s-*\s*)(.*)","$2")

 

This may be quicker than two separate RegEx tools, but I haven't tested it. 

benakesh
12 - Quasar

Hi @JasonDuRandt ,

Check this link   for  regular expression for  these 2 examples .

12345 - Red Apples S
34567 Blue Oranges

 

https://regex101.com/r/beerWL/1   

 

(\d+)\s(?:-\s)?(.*)     

1st group is numbers

2nd  group is  non capturing group which may  occur  zero or 1 time

3rd group is whatever follows previous group

Labels