Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx to Cleanse Data

knnwndlm
8 - Asteroid

Hi SME,

 

I have the following attached file with the name of the City in column A that I would like to use RegEx to remove extraneous info such as State references.  The result that I'm looking for is shown in column B.

 

I haven't been able to use RegEx to cover all the cases as I'm still learning RegEx.  When I use \W+\s*\w* in the RegEx tool, I ended not capturing all the cities with multiple words.  Could you please show me how to do this in RegEx?

 

Thanks,

kwl

5 REPLIES 5
ArtApa
Alteryx
Alteryx

Hi @knnwndlm - Here is a possible solution:

Cleanse Data AA.jpg

Qiu
21 - Polaris
21 - Polaris

@knnwndlm 
Cleansing city names can be very tricky.
I would suggest to use a master list of all cities then Use Find and Replace, rather than Regex

https://simplemaps.com/data/us-cities

0124-knnwndlm.png

knnwndlm
8 - Asteroid

Thank you Both!

 

@Qiu - I was looking at Los Angeles County and noticed that some cities did not have zip codes.  Why is that the case?

knnwndlm
8 - Asteroid

@ArtApa - Could you please help me understand the RegEx "\s[a-zA-Z]{2}\b"?  I could understand the individual pieces in the RegEx but I couldn't figure out why the results were generated the way it did.  

 

I just noticed that N. Palm Springs became N.  How can I modify the RegEx to incorporate this case?

 

Thank you,

kwl

knnwndlm
8 - Asteroid

@ArtApa - When I use the RegEx above, I got "Santa Fe Springs" converted to "Santa Springs."  How can I modify it so that it doesn't do that?

Labels