RegEx
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi RegEx Pro,
I'm trying to build a RegEx to tease out the name of the city without the extraneous info at the end. For example, here's my table:
HESPERIA CA
HESPERIA, CA
HESPERIA. CA
HESPERIA,, CA,
BOULDER& NV
BOULDER&NV
What I want to capture is HESPERIA and BOULDER leaving out the characters. The way I can tell that it's the end of the city name is that we have char(s) and/or space(s) before CA or NV. I've been doing this the long way by using RegEx and Text To Columns with one character at a time. I'm trying to figure out a way to factor in other miscellaneous characters before CA or NV that I might not be aware of.
Could somebody help me figuring out an expression in RegEx that will encapsulates all the miscellaneous characters before CA or NV and leaves the city name untouched?
Thanks,
K
Solved! Go to Solution.
- Labels:
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
How do you get on with the following, @knnwndlm? You’d need to be using the RegEx tool in Parse mode:
(\w+).+
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @knnwndlm,
Another item you might try is cleaning all punctuation with a Data Cleansing tool, then trimming off the two characters on the right of your original value. By doing it this way there is no Regex required, and you can pull cities with a space in the name cleanly as well. Please see the below workflow for an example.
Best,
Senior Sales Engineer
Alteryx
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @DataNath ,
For New York, (\w+).+ only picks up New and left the York out. Also, I'm not sure I understand the entire expression there. I'm just foraying into RegEx. Could you please help me understand the expression?
Thanks,
K
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks @ConnorK!
If I were to do this using RegEx, what's the best approach to cover all the punctuations? I'm trying to learn RegEx and that's why I want to use it in my workflow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks @DataNath!
This works! Could you please help me understand the expression as I'm still learning it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
No problem @knnwndlm, happy to help! The current expression breaks down as follows:
(.+\w) < Captures all characters up until the last occurrence of a word character (denoted by \w). The reason it doesn't continue on until the other letters i.e. NY/CA/NV is because after the capture group i.e. everything bracketed, we state:
.+ < One or more of any character (the . is a wildcard for any character other than new line, the + means 'one or more'), followed by:
\w{2} < Two (made explicit by the {2}) word characters - in this case the state abbreviations.
We could've also used something like (.+\w).+[A-Z]{2} to be more specific to 2 capital letters but for a use like this I thought \w would be fine. Whilst picking up RegEx, and even for referring back to often when getting the hang of it, cheat sheets like so are always great:
https://cheatography.com/davechild/cheat-sheets/regular-expressions/
I always tend to test my expressions before running them in Alteryx as well, with my site of preference being https://regex101.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks @DataNath!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @DataNath,
I noticed that some city names were chopped apart based on (.+\w).+\w{2}. Here are a few examples:
ONTARIO becomes ONTA
COMPTON becomes COMP
TULARE becomes TUL
RANCHO CUCAMONGA becomes RANCHO CUCAMO
MONTEREY PARK becomes MONTERY P
Guessed I forgot to include the case where there's no CA/NV/NY. How do tell RegEx to ignore such cases?
Thanks,
K
