Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

RegEx Parse Help

elclark
8 - Asteroid

Hello - I'm trying to parse these fields into 5 columns using RegEx, however, sometimes the data in column 5 is blank. The spacing between each field is not consistent as its a PDF converted to text.

sdfrgh    Doe, Jane   Yes    Sales Rep I   Disney-Land
123566      Smith, John No  Consultant   Disney-World,Disney-Land
12dvs6    Smith, Jane   Yes   Sales Rep

 

Desired Output

sdfrghDoe, JaneYesSales Rep IDisney-Land
123566Smith, JohnNoConsultant   Disney-World,Disney-Land
12dvs6Smith, JaneYesSales Rep 

 

9 REPLIES 9
Qiu
21 - Polaris
21 - Polaris

@elclark 
It appears to be difficult for me to tell the diifference between last 2 columns.
Any chance that the last column always starts from Disney or null?

flying008
14 - Magnetar

Hi, @elclark 

 

A dynamic resolution for you :

 

And if your first line of input isn't the field name, it's easier to deal with.

 

录制_2024_02_28_09_27_23_680.gif

gawa
15 - Aurora
15 - Aurora

hi @elclark 

As @Qiu says, it is difficult to parse the last 2 columns. My idea is:

First, parse data by the following Regex. You will get total 4 columns(last 2 columns are still combined)

In the 3rd column, value should be Yes or No so its nature can be used as a good anchor in regex expression.

^(.+?)\s{2,}(.+?)\s+(Yes|No)\s{2,}(.+)$

Second, parse the 4th column by Text to Column tool into 2 columns, and you will get the desired output.

 

image.png

flying008
14 - Magnetar

Hi, @elclark 

 

or parse by the regex to 1 step:

Spoiler
^(.+?)\s{2,}(.+?)\s+(Yes|No)\s{2,}((?:\w+\s)*\w+)($|\s{2,}.+?$)

 

 

elclark
8 - Asteroid

@flying008 the dynamic resolution almost worked, the scenario I ran into is sometimes the name section is blank but the format is consistent, unique id followed by name (or a blank space where the name should be) and then followed by either No or Yes (and this field is never blank and only those options).

flying008
14 - Magnetar

Hi, @elclark 

 

Maybe you can upload the fail case data for test.

elclark
8 - Asteroid

@Qiu The last column always starts with 1 of 9 locations (so like Disney-xxx, Paris-xxxx, Rome-xxx, etc, those 9 locations are always the same but what follows after the - can be different) or its Null

Qiu
21 - Polaris
21 - Polaris

@elclark 
I have to go through a lot to deal the last two columns and I hope your data is not so big since I am using Append tool.

@gawa 

Maybe you have better idea? 😁

0229-elclark.png

Labels