community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

How to extract first five words in a sentence after a key word?

Meteoroid

Hello,

 

New to using regex.

 

I'm trying to extract the first 5 words after a key word. I've used a simple regex to get everything after a key word - (replac)(.*)       - but I'd like to narrow it down to up to the first 5 words (names range from 2 to 5 words).  for example:

 

replace John L. Robert Smith Jr. as the new agent etc....

 

Note: punctuation isn't always there.

 

Also, sometimes the key word could be replace, replaced or replacing ...that's why I used (replac) ...I know there must be a way to put each variable in but not sure how.

 

Thanks in advance!

Kim

Alteryx Certified Partner
Alteryx Certified Partner

Hi @kas 

 

Could give us more examples of your sample data and what your desired output would be?

 

Just to make it more clear.


Cheers,

Meteoroid

Here's a few more examples:

 

replaced Peter J Jones as agent with Darlene Smith

replacing Joan Allen Johnson 444555 with Wendy Jennings per request

replace Steve Cove per agency request with John J. Cummings 345676 

 

Output:

 

Replaced | Peter J Jones

Replacing | Joan Allen Johnson

Replace | Steve Cove

 

Kim

Alteryx Certified Partner
Alteryx Certified Partner

@kas 

 

This could work:

 

REGEX_Replace([Field1], ".*(replac[a-z]+\b)\W*((?:[A-Z][a-z]*\b\W*)+).*", "$1 | $2", 0)

 

5 words.PNG

 

This is advanced Regex. I tested in a small dataset, but you should check with a larger one.


This also has some rules.


Your names must be Titlecased. So, uppercased and lowercased names won't be captured.

 

WF attached

 

Cheers,

Meteoroid

OK, this is close...but, the casing is all over the place. If I make everything upper case which would be the easiest, how would this solution need to be changed?

 

Kim

Alteryx Certified Partner
Alteryx Certified Partner

@kas 

 

Then it's impossible to make a difference between names and regular words.

 

How would RegEX know that it should capture just "Peter J Jones" and not "Peter J Jones as agent.."

 

The names must have something unique, some kind of rule that differentiates them from other words.

 

Cheers,

Meteoroid

Until I get them to regulate how the data is input I don't have a choice. The best I can do then is to request the first 5 words after replace, replacing or replaced. This will narrow things down in the interim that I can do some fuzzy matching afterward to make comparisons.

With that in mind what is best regex for that then to pull just the first five words regardless of case?

 

Thanks again!

Kim

Alteryx Certified Partner
Alteryx Certified Partner

@kas 

 

Then this expression below works fine.

 

REGEX_Replace([Field1], ".*(replac[a-z]+\b)\W*((?:\w+\b\W*){5}).*", "$1 | $2", 0)

 

Cheers,

Meteoroid

Unfortunately it doesn't if the name is all upper case.

 

I apologize; in trying to simplify my question it ultimately left too much for assumption.

Here is a better example of what my strings may look like:

 

Leilani Hernandez (TP60857), 5/9/2019 6:12:56 PM ET: Per owner request replaced LAURA JEAN TEBLAY 423931/cm 35827011 (100%) with ROGER A SEIM 582245/cm 758 1798
Tracy McCarthy (cr54196), 4/29/2019 2:37:09 PM ET:  Received block transfer request from agency 002 replacing Manny Alfonso 811794 / 702 0103.0; 702 0103.1 with Wendy Jennings 240716 / 702 0104.0 See reassignment tracking under ID 411794

Peter Martin (JJ82609), 7/8/2019 11:16:41 AM ET:  Per Agency request, replace JONATHAN TYLER 609103 (50% -S) and SANDRA RODRIGUEZ (50%) with JOHN JOHNSON 599578 (100%)

 

From this I would like to extract the name (which is sometimes all caps or not) after key words "replace" or "replaced" or "replacing".

 

Results to look like this:

Replaced
Laura Jean Teblay
Manny Alfonso
Jonathan Tyler

 

Again, sorry for not being clearer initially...your time and help is appreciated!

Kim

 

 

Highlighted
Aurora

Hi @kas 

 

Since it looks like @Thableaus has stepped out for a minute, i'll offer up a solution.

 

Your actual data is much better than the mock-ups that you originally provided.  With them we can see the end delimiter for the names is actually a series of digits.  This regex formula should work if your data always has these

.*?replac.*? (.*?) \d+

 

Results.png

 

 

Dan

 

Labels