Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extract second iteration of Name, City and State

hellyars
13 - Pulsar

Another day, another RegEx struggle...

 

I want to pull the name, city and state for an organization issuing a contract.   This appears towards the end of the text and is separated by commas (99% of the time).  It is also always followed by the phrase "contracting activity."

 

The problem is the body of text begins with the awardee's name, city and state using the same comma separated construct.  I know how to capture crudely the awardee information using the expression

 

(^.*?)\,.*?([[:upper:]].*?)\,.*?([[:upper:]].*?)\,.*?  

 

But how do I isolate and extract the Org name, city, and sate information. Can I use contracting activity as an anchor to look back or is there another approach?   

 

Awardee name, city, and state, ipsum dolor sit amet, awarded consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco Random Capitalized Word laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt Org Name, Org City, Org State, is the contracting activity id est laborum.

 

I tried using the expression \bawarded.*([[:upper:]].*)\,\s([[:upper:]].*)\,\s([[:upper:]].*)\,.*$. to avoid the initial awardee information.   I can zero in on city and state.  But, I am having difficulty with isolating name.  The reason being that name can be 2, 3, 4, 5 or 6 + worlds.   For example, the name might be U.S. Army Contracting Command or Naval Sea Systems Command, etc.  

 

I would love to figure out how to do this with RegEx.  But my alternative is to build a lookup table.  Ugh.


Thanks,

12 REPLIES 12
Thableaus
17 - Castor
17 - Castor

Hi @hellyars 

 

Does this work for you?

 

.*(?=\b\w+\s\w+,)(.*,.*,.*)(?=,.*contracting\sactivity).*

 

REGEX_Replace([Field1], ".*(?=\b\w+\s\w+,)(.*,.*,.*)(?=,.*contracting\sactivity).*", "$1")

 

The pattern I took was - 3 commas interpolated by some words, something else, contracting activity.

 

Cheers,

hellyars
13 - Pulsar

 

@Thableaus 

 

.*(?=\b\w+\s\w*,)(.*,.*,.*)(?=,.*contracting\sactivity).*

 

It is almost there. And, this is where I ran into trouble with my expression.   How can we adjust the expression to capture all capitalized name segments?  The name will always consist of at least two or more segments.

 

Is there a way to make it find the first capitalized word and then all the capitalized words (or characters in the case of abbreviations such as U.S.) between it and  the first common that is then followed by the rest of the pattern (city, state )?

 

Screen Shot 2019-02-27 at 11.36.46 AM.png

Thableaus
17 - Castor
17 - Castor

@hellyars 

 

.*?((?:\b[A-Z]\w*\W*)+,.*,.*)(?=,.*contracting\sactivity).*

 

Try this.


Cheers,

hellyars
13 - Pulsar

Close.   Here is the good news.  It works.  I modified it slightly to split name, city, and state.  But, both versions work.  That is, they work on regex101.com environment.  I can't get them to work in Alteryx.   In Alteryx, its grabbing the awardee name, city, state at the start of the text.  It seems to be ignoring everything that works so well on regex101.com.

 

 

Screen Shot 2019-02-27 at 2.19.16 PM.png

Thableaus
17 - Castor
17 - Castor

@hellyars 

 

Use Parse Method in RegEX tool and it will work. Be sure to uncheck the "Case Insensitive" box.


RegEX_Replace in this case is a bit tricky because it tries to replace a part of the string, so the engine works differently. 

hellyars
13 - Pulsar

It does and does not.  

 

If I take the exact example I used in this post and pair it with a RegEx tool using a Text Input, it works.


If I add it to my real data, it does not work.  

 

And this is it where it gets strange.  If I copy a body of text from a field in the real data and add it to the Text Input in my first test run, it works.

 

Huh? 

Thableaus
17 - Castor
17 - Castor

@hellyars 

 

Could you share your original data? At least filtering where it does not work, so we can compare this situation.


Cheers,

hellyars
13 - Pulsar

@Thableaus 

 

Here is an extract from the real data.   It will not work hooked up to the RegEx tool.  But, you can copy and paste records to the Input Tool and it will work. 

 

 

Thanks

Thableaus
17 - Castor
17 - Castor

@hellyars 

 

No file was appended. Please review your post.

 

Cheers,

Labels