Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx - How to Account for Extra Information and Missing delimiters

hellyars
13 - Pulsar

RegEx (again)

 

This time I want to extract Company Name, City, and State.  My current solution uses a RegEx tool set to parse to 3 output fields using the following expression 

(^.*?)\,.*?([[:upper:]].*?)\,.*?([[:upper:]].*?)\,.*?   This expression is highly dependent upon the entries sticking to the pattern and using the commas to separate the desired fields.

 

 

Company Name, Company City, Company State, ipsum dolor sit amet, awarded a $1,999,999,999 consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor awarded a not to exceed $3,555,678 in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

 It works until the source data does not follow its own pattern.

 

Here is an example of a problem entry.   There are two issues.  One, they inserted the name of a business unit between the company name and city.   Second, they left off the comma after the state name so the expression does not know to stop or it only captures the New in New York.  What can I do?

 

Company Name, Business Unit, Company City, Company State ipsum dolor sit amet, awarded a $1,999,999,999 consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor awarded a not to exceed $3,555,678 in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Thanks,

 

2 REPLIES 2
Thableaus
17 - Castor
17 - Castor

Hi @hellyars 

 

Unfortunately RegEX works with patterns.

 

That means you gotta know what's going on with your data to apply them. It's almost impossible to apply a Universal RegEX to any variation.

If this situation of variance happens over and over again, then one RegEX will hardly solve things out for you.

So in some cases, you have to break your data prep in some steps.

 

I think the first thing to do is to study your data. Use Data Investigation Tools to analyze what's the most common pattern.

Is just one time that Business Unit comes up to break the pattern? Does this situation happen more than once?

 

Cheers,

hellyars
13 - Pulsar

This is a less than 5% issue.  I am trying to account for every scenario.  That might be a bridge too far -- you can't control everything.  It would be easier to identify the values in the output field that don't conform and work specific fixes for those isolated scenarios, etc.

 

Thanks.

Labels