Important Community update: The process for changing your account details was updated on June 25th. Learn how this impacts your Community experience and the actions we suggest you take to secure your account here.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

RegEx Parse Help

elclark
8 - Asteroid

Hello - I'm trying to parse these fields into 5 columns using RegEx, however, sometimes the data in column 5 is blank. The spacing between each field is not consistent as its a PDF converted to text.

sdfrgh    Doe, Jane   Yes    Sales Rep I   Disney-Land
123566      Smith, John No  Consultant   Disney-World,Disney-Land
12dvs6    Smith, Jane   Yes   Sales Rep

 

Desired Output

sdfrghDoe, JaneYesSales Rep IDisney-Land
123566Smith, JohnNoConsultant   Disney-World,Disney-Land
12dvs6Smith, JaneYesSales Rep 

 

9 REPLIES 9
Qiu
21 - Polaris
21 - Polaris

@elclark 
It appears to be difficult for me to tell the diifference between last 2 columns.
Any chance that the last column always starts from Disney or null?

flying008
14 - Magnetar

Hi, @elclark 

 

A dynamic resolution for you :

 

And if your first line of input isn't the field name, it's easier to deal with.

 

录制_2024_02_28_09_27_23_680.gif

gawa
15 - Aurora
15 - Aurora

hi @elclark 

As @Qiu says, it is difficult to parse the last 2 columns. My idea is:

First, parse data by the following Regex. You will get total 4 columns(last 2 columns are still combined)

In the 3rd column, value should be Yes or No so its nature can be used as a good anchor in regex expression.

^(.+?)\s{2,}(.+?)\s+(Yes|No)\s{2,}(.+)$

Second, parse the 4th column by Text to Column tool into 2 columns, and you will get the desired output.

 

image.png

flying008
14 - Magnetar

Hi, @elclark 

 

or parse by the regex to 1 step:

Spoiler
^(.+?)\s{2,}(.+?)\s+(Yes|No)\s{2,}((?:\w+\s)*\w+)($|\s{2,}.+?$)

 

 

elclark
8 - Asteroid

@flying008 the dynamic resolution almost worked, the scenario I ran into is sometimes the name section is blank but the format is consistent, unique id followed by name (or a blank space where the name should be) and then followed by either No or Yes (and this field is never blank and only those options).

flying008
14 - Magnetar

Hi, @elclark 

 

Maybe you can upload the fail case data for test.

elclark
8 - Asteroid

@Qiu The last column always starts with 1 of 9 locations (so like Disney-xxx, Paris-xxxx, Rome-xxx, etc, those 9 locations are always the same but what follows after the - can be different) or its Null

Qiu
21 - Polaris
21 - Polaris

@elclark 
I have to go through a lot to deal the last two columns and I hope your data is not so big since I am using Append tool.

@gawa 

Maybe you have better idea? 😁

0229-elclark.png

Labels