Alteryx Designer Desktop Discussions

knnwndlm · ‎09-20-2022

Hi RegEx Pro,

I'm trying to build a RegEx to tease out the name of the city without the extraneous info at the end. For example, here's my table:

HESPERIA CA

HESPERIA, CA

HESPERIA. CA

HESPERIA,, CA,

BOULDER& NV

BOULDER&NV

What I want to capture is HESPERIA and BOULDER leaving out the characters. The way I can tell that it's the end of the city name is that we have char(s) and/or space(s) before CA or NV. I've been doing this the long way by using RegEx and Text To Columns with one character at a time. I'm trying to figure out a way to factor in other miscellaneous characters before CA or NV that I might not be aware of.

Could somebody help me figuring out an expression in RegEx that will encapsulates all the miscellaneous characters before CA or NV and leaves the city name untouched?

Thanks,

K

DataNath · ‎09-20-2022

How do you get on with the following, @knnwndlm? You’d need to be using the RegEx tool in Parse mode:

(\w+).+

ConnorK · ‎09-20-2022

Hi @knnwndlm,

Another item you might try is cleaning all punctuation with a Data Cleansing tool, then trimming off the two characters on the right of your original value. By doing it this way there is no Regex required, and you can pull cities with a space in the name cleanly as well. Please see the below workflow for an example.

Best,

Connor Kelleher
Senior Sales Engineer
Alteryx

knnwndlm · ‎09-20-2022

Hi @DataNath ,

For New York, (\w+).+ only picks up New and left the York out. Also, I'm not sure I understand the entire expression there. I'm just foraying into RegEx. Could you please help me understand the expression?

Thanks,

K

knnwndlm · ‎09-20-2022

Thanks @ConnorK!

If I were to do this using RegEx, what's the best approach to cover all the punctuations? I'm trying to learn RegEx and that's why I want to use it in my workflow.

DataNath · ‎09-20-2022

Hey @knnwndlm, does the following get you any closer?

(.+\w).+\w{2}

knnwndlm · ‎09-20-2022

Thanks @DataNath!

This works! Could you please help me understand the expression as I'm still learning it?

DataNath · ‎09-20-2022

No problem @knnwndlm, happy to help! The current expression breaks down as follows:

(.+\w) < Captures all characters up until the last occurrence of a word character (denoted by \w). The reason it doesn't continue on until the other letters i.e. NY/CA/NV is because after the capture group i.e. everything bracketed, we state:

.+ < One or more of any character (the . is a wildcard for any character other than new line, the + means 'one or more'), followed by:

\w{2} < Two (made explicit by the {2}) word characters - in this case the state abbreviations.

We could've also used something like (.+\w).+[A-Z]{2} to be more specific to 2 capital letters but for a use like this I thought \w would be fine. Whilst picking up RegEx, and even for referring back to often when getting the hang of it, cheat sheets like so are always great:

https://cheatography.com/davechild/cheat-sheets/regular-expressions/

I always tend to test my expressions before running them in Alteryx as well, with my site of preference being https://regex101.com/

knnwndlm · ‎09-20-2022

Thanks @DataNath!

knnwndlm · ‎09-20-2022

Hi @DataNath,

I noticed that some city names were chopped apart based on (.+\w).+\w{2}. Here are a few examples:

ONTARIO becomes ONTA

COMPTON becomes COMP

TULARE becomes TUL

RANCHO CUCAMONGA becomes RANCHO CUCAMO

MONTEREY PARK becomes MONTERY P

Guessed I forgot to include the case where there's no CA/NV/NY. How do tell RegEx to ignore such cases?

Thanks,

K

Alteryx Designer Desktop Discussions

RegEx

Regex Line Extraction

Tool Mastery | RegEx

REGEX

RegEx tool

Regex Formula

[SHARING] How to: Dynamic Formula Example with new...

Re: Need to check if we can activate container bas...

Re: Date Time Function - Prioritization Base on Du...

Re: Regex Line Extraction

Re: Rolling 12 months dates