Dev Space

khadijahneddy · ‎11-02-2020

Based on the picture attached above, how to parse these sentences using RegEx when they are not in the same length? I want to separate the country and technology name in the cluster name to a new column.

Eg: Melbourne | Monash University

: Guangzhou South China | Univ. of Technology

: Hartford, CT | United Technologies

Thank you in advance for your attention and help!

DavidP · ‎11-03-2020

Hi @khadijahneddy

The thing you're looking for is a pattern for which you can build some rules to parse the data by, but I don't see patterns on which a generic rule can be built.

The next option you're then looking at would be to build a pattern that can catch and parse SOME records, filter them out, build another rule for the next set, etc. until you've accounted for every possibility.

This is obviously not ideal as you're rules will need to be maintained and you'll need checks to ensure nothing falls through the cracks.

For instance, let's say one rule can be records that starts with a word followed by a comma space and 2 capital letters as Parsed field 1 and everything to follow as parse field 2

Another example is records that only have 2 words separated by space - parse each word to a field.

I'm attaching an example with these 2 rules to show how you would build it up.

At the end you can then use a Union tool to bring them all back together.

Dev Space

How to parse data using RegEx with my data?