Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extract Data in a Specific Format That Could Be in Different Position of A String

EYU07
7 - Meteor

The Almighty Alteryx Community, 

 

I am struggling to extract some IDs from a string.  The IDs are in a specific format:  they are 8 digits long; the 1st and last character in the IDs are almost always a capital letter (with very few exception that it starts with a number instead of a letter, but I can live with some manual investigation); the IDs can appear in various positions of the string.  Lastly, if there is no ID information in such format, then I'd just bring in the whole text from the "raw data" column for manual lookup.  Examples would be as below:

 

Raw DataDesire Output
This is a test - A2AA111BA2AA111B
2019-08 AAAA222C - testAAAA222C
AABB3F4G - This is a testAABB3F4G
082019 - This is a test_AA7C55FGAA7C55FG
String with no data in ID formatString with no data in ID format

 

Is there a way to identify these 8-character IDs and extract only them from the string?  Please help!!!

 

Best regards,

yyu3

7 REPLIES 7
NickSm
Alteryx
Alteryx

Hey @EYU07,

 

Regex is probably the way to go to find a pattern of 8 digits/letters.  How specific you need to make the pattern depends on the potential values you'd be looking for.  For example, this looks for a string of 8 characters that starts with 'A' and returns it.

 

clipboard_image_0.png

EYU07
7 - Meteor

Hi @NickSm 

 

Thank you for the quick reply!  The first letter is not limited to "A", can be any letter A-Z, same goes with the last letter.  And one concern I have is, what if there is a 8-character word or something that's in all cap but has a space within, like "RCL FROM", that occurs before the ID, would this return that word instead of the ID?  I am wondering, if there is a way to find the ID by specifying the first and last letter to be capital letters and with no space in the return result, so that it could avoid returning a 8-character word.  

 

Sorry if it's confusing, just bad data that I'm trying to clean up.

 

Thanks so much!

NickSm
Alteryx
Alteryx

@EYU07 

 

Sure, to get an uppercase followed by 6 characters followed by an uppercase, no spaces, you would want the expression to be something like

 

([A-Z]{1}\w{6}[A-Z]{1})

EYU07
7 - Meteor

And if I want to change the 2nd to the last character to be a number?  In my data, the word "AMREICAN" happens to be in the situation that I mentioned before and it's returning "AMERICAN" instead of the ID... Otherwise, the updated expression worked for all other instances.

NickSm
Alteryx
Alteryx

Got it, so in that case you're on the right track to make it a little more specific.  Could just modify that expression to:

 

([A-Z]{1}\w{5}[0-9]{1}[A-Z]{1})

EYU07
7 - Meteor

Exactly what I wanted to achieve!  Thanks so much, I really appreciate it!

hpallett
5 - Atom

Hi, 

I am struggling to extract some IDs from a string. The IDs are in a specific format: Upper case letter, upper case letter with 7 Numbers after no spaces. 

I am struggling to extract some IDs from a string please? thank you

Labels