This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Based on the picture attached above, how to parse these sentences using RegEx when they are not in the same length? I want to separate the country and technology name in the cluster name to a new column.
The thing you're looking for is a pattern for which you can build some rules to parse the data by, but I don't see patterns on which a generic rule can be built.
The next option you're then looking at would be to build a pattern that can catch and parse SOME records, filter them out, build another rule for the next set, etc. until you've accounted for every possibility.
This is obviously not ideal as you're rules will need to be maintained and you'll need checks to ensure nothing falls through the cracks.
For instance, let's say one rule can be records that starts with a word followed by a comma space and 2 capital letters as Parsed field 1 and everything to follow as parse field 2
Another example is records that only have 2 words separated by space - parse each word to a field.
I'm attaching an example with these 2 rules to show how you would build it up.
At the end you can then use a Union tool to bring them all back together.