Happy 8th birthday to the Maveryx Community! Take a walk down memory lane in our birthday blog, and don't miss out on the awesome birthday present that all Maveryx Community members get to take advantage of!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx

knnwndlm
8 - Asteroid

Hi RegEx Pro,

 

I'm trying to build a RegEx to tease out the name of the city without the extraneous info at the end.  For example, here's my table:

 

HESPERIA CA

HESPERIA, CA

HESPERIA. CA

HESPERIA,, CA,

BOULDER& NV

BOULDER&NV

 

What I want to capture is HESPERIA and BOULDER leaving out the characters.  The way I can tell that it's the end of the city name is that we have char(s) and/or space(s) before CA or NV.  I've been doing this the long way by using RegEx and Text To Columns with one character at a time.  I'm trying to figure out a way to factor in other miscellaneous characters before CA or NV that I might not be aware of.

 

Could somebody help me figuring out an expression in RegEx that will encapsulates all the miscellaneous characters before CA or NV and leaves the city name untouched?

 

Thanks,

K

17 REPLIES 17
DataNath
17 - Castor

How do you get on with the following, @knnwndlm? You’d need to be using the RegEx tool in Parse mode:

 

(\w+).+

ConnorK
Alteryx
Alteryx

Hi @knnwndlm,

 

Another item you might try is cleaning all punctuation with a Data Cleansing tool, then trimming off the two characters on the right of your original value. By doing it this way there is no Regex required, and you can pull cities with a space in the name cleanly as well. Please see the below workflow for an example.

 

Best,

Connor Kelleher
Senior Sales Engineer
Alteryx
knnwndlm
8 - Asteroid

Hi @DataNath ,

 

For New York, (\w+).+ only picks up New and left the York out.  Also, I'm not sure I understand the entire expression there.  I'm just foraying into RegEx.  Could you please help me understand the expression?

 

Thanks,

K

knnwndlm
8 - Asteroid

Thanks @ConnorK!

 

If I were to do this using RegEx, what's the best approach to cover all the punctuations?   I'm trying to learn RegEx and that's why I want to use it in my workflow. 

DataNath
17 - Castor

Hey @knnwndlm, does the following get you any closer?

 

(.+\w).+\w{2}

knnwndlm
8 - Asteroid

Thanks @DataNath!

 

This works!  Could you please help me understand the expression as I'm still learning it? 

DataNath
17 - Castor

No problem @knnwndlm, happy to help! The current expression breaks down as follows:

 

(.+\w) < Captures all characters up until the last occurrence of a word character (denoted by \w). The reason it doesn't continue on until the other letters i.e. NY/CA/NV is because after the capture group i.e. everything bracketed, we state:

 

.+ < One or more of any character (the . is a wildcard for any character other than new line, the + means 'one or more'), followed by:

 

\w{2} < Two (made explicit by the {2}) word characters - in this case the state abbreviations.

 

We could've also used something like (.+\w).+[A-Z]{2} to be more specific to 2 capital letters but for a use like this I thought \w would be fine. Whilst picking up RegEx, and even for referring back to often when getting the hang of it, cheat sheets like so are always great:

 

https://cheatography.com/davechild/cheat-sheets/regular-expressions/

 

I always tend to test my expressions before running them in Alteryx as well, with my site of preference being https://regex101.com/

 

DataNath_0-1663707296187.png

knnwndlm
8 - Asteroid

Thanks @DataNath!

knnwndlm
8 - Asteroid

Hi @DataNath,

 

I noticed that some city names were chopped apart based on (.+\w).+\w{2}.  Here are a few examples:

 

ONTARIO becomes ONTA

COMPTON becomes COMP

TULARE becomes TUL

RANCHO CUCAMONGA becomes RANCHO CUCAMO

MONTEREY PARK becomes MONTERY P

 

Guessed I forgot to include the case where there's no CA/NV/NY.  How do tell RegEx to ignore such cases? 

 

Thanks,

K


Labels