ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now
The Alteryx Community will be temporarily unavailable for a time due to scheduled maintenance on Thursday, April 22nd. Please plan accordingly.

Dev Space

Customize & extend the power of Alteryx. SDKs, APIs, custom tools, and more!

How to parse data using RegEx with my data?

khadijahneddy
5 - Atom

khadijahneddy_0-1604367353853.png

Based on the picture attached above, how to parse these sentences using RegEx when they are not in the same length? I want to separate the country and technology name in the cluster name to a new column.

 

Eg: Melbourne | Monash University

    : Guangzhou South China | Univ. of Technology 

    : Hartford, CT | United Technologies

 

Thank you in advance for your attention and help!

 

 

DavidP
16 - Nebula
16 - Nebula

Hi @khadijahneddy 

 

The thing you're looking for is a pattern for which you can build some rules to parse the data by, but I don't see patterns on which a generic rule can be built.

 

The next option you're then looking at would be to build a pattern that can catch and parse SOME records, filter them out, build another rule for the next set, etc. until you've accounted for every possibility.

 

This is obviously not ideal as you're rules will need to be maintained and you'll need checks to ensure nothing falls through the cracks.

 

For instance, let's say one rule can be records that starts with a word followed by a comma space and 2 capital letters as Parsed field 1 and everything to follow as parse field 2

 

Another example is records that only have 2 words separated by space - parse each word to a field.

 

I'm attaching an example with these 2 rules to show how you would build it up.

 

At the end you can then use a Union tool to bring them all back together.

 

DavidP_0-1604412642118.png