Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx an Address substring

MattC1
6 - Meteoroid

I've got a data set where a majority of the Address fields contains "ATTN: firstname lastname ### Address", see below, and I'm trying to only pull out the actual street address and not any of the ATTN: or names before the actual address.   It seems like I'll need a RegEx/Formula Tool to help accomplish this but I'm having a disconnect on what RegEx operators to use in my expression.  Any help will be greatly appreciated:

 

From This:

ATTN: MATTHEW OR MARC LASTNAME 1234 WINTON ROAD
ATTN: RON LASTNAME 123 EAST 2ND STREET

 

To This:

1234 WINTON ROAD
123 EAST 2ND STREET
6 REPLIES 6
Joe_Mako
12 - Quasar

How about the attached that uses the expression:

[^\d]*(.*)

and returns just:

$1

regex.png

MattC1
6 - Meteoroid

Looks like that was the trick, thanks Joe!  Now I'll try to decode " [^\d]*(.*) " to better understand.

Joe_Mako
12 - Quasar

Does this help:

 

[^\d] any character that is not a digit
* zero or more times

 

( begin capture group
. any character
* zero or more times
) end capture group

 

The $1 references and returns the characters within the capture group

 

This will exclude all characters before the first digit character found, and only return character that are or after the first digit character

MarqueeCrew
20 - Arcturus
20 - Arcturus

When I saw [^\d], I remembered a shorthand version of:  \D

 

Here are some others....

SHORTHAND CHARACTER CLASSES

The basic ones are:

  • \sWhite space characters
  • \SNon-white space characters
  • \wWord characters
  • \WNon-word characters
  • \dDigit characters
  • \DNon-digit characters
  • \c — * Control character. e.g. \cZ = ctrl + Z
  • \x — * Hexadecimal character code. e.g. \xA9 = ©
  • \O — * Octal character code similar to Hexadecimal characters
  • \s — Equates to any white space character. This includes newline characters (\n and \r in most implementations however some differ) and a normal space character or a tab character (\t).
  • \S — This is the inverse of \s, it will match any character that's not deemed to be a white space character.
  • \w — The equivalent character class would be: [a-zA-Z0-9_] This character class has a couple of character ranges: lower case a to z, upper case A to Z and 0 to 9, and an underscore. Note the inclusion of numbers and the underscore in this range. I often use the following to include hyphens: [\w-]
  • \W — Similarly, the character class of this one would be: [^a-zA-Z0-9_] This ^ character is new to us, it negates the character class so means any characters except the ones stated. e.g. [^W] would match any character apart from an upper case W.

 

  • \d — Similar to the word character class, digit matches: [0-9]
  • \D — Again, this is similar to the Non-word character class where it matches any non-digit character. e.g. [^0-9] using the ^ again to negate the character class.

http://www.andrewgoodricke.com/blog/regex-shorthand-classes/

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
SeanAdams
17 - Castor
17 - Castor

@Joe_Mako and @MarqueeCrew - you are the RegEx kings.   It's not that I don't like RegEx, I'm just always worried that I may accidentally blow up something, or cut my finger off in error, or short out the power to some subset of the country :-)

 

I really think there's fertile ground in folks like you who have learned how to work productively with RegEx doing a quick tool Mastery with @MattD or a live training with @JoeM because RegEx really is powerful once you get your head 'round it, and you both have been such active and helpful contributors in this space!

 

 

MattD
Alteryx Alumni (Retired)

Good thinking, @SeanAdams ;) It's no substitute for @Joe_Mako or @MarqueeCrew's expertise, but we already have a Regex Tool Mastery that goes through the different match options and pattern syntax if you guys would like to see anything added!

 

Hopefully, this will help keep the accidents to a minimum :P

 

Former Alteryx, Inc. Support Engineer, Community Data Architect, Data Scientist then Data Engineer
Labels
Top Solution Authors