The Almighty Alteryx Community,
I am struggling to extract some IDs from a string. The IDs are in a specific format: they are 8 digits long; the 1st and last character in the IDs are almost always a capital letter (with very few exception that it starts with a number instead of a letter, but I can live with some manual investigation); the IDs can appear in various positions of the string. Lastly, if there is no ID information in such format, then I'd just bring in the whole text from the "raw data" column for manual lookup. Examples would be as below:
Raw Data | Desire Output |
This is a test - A2AA111B | A2AA111B |
2019-08 AAAA222C - test | AAAA222C |
AABB3F4G - This is a test | AABB3F4G |
082019 - This is a test_AA7C55FG | AA7C55FG |
String with no data in ID format | String with no data in ID format |
Is there a way to identify these 8-character IDs and extract only them from the string? Please help!!!
Best regards,
yyu3
Solved! Go to Solution.
Hey @EYU07,
Regex is probably the way to go to find a pattern of 8 digits/letters. How specific you need to make the pattern depends on the potential values you'd be looking for. For example, this looks for a string of 8 characters that starts with 'A' and returns it.
Hi @NickSm
Thank you for the quick reply! The first letter is not limited to "A", can be any letter A-Z, same goes with the last letter. And one concern I have is, what if there is a 8-character word or something that's in all cap but has a space within, like "RCL FROM", that occurs before the ID, would this return that word instead of the ID? I am wondering, if there is a way to find the ID by specifying the first and last letter to be capital letters and with no space in the return result, so that it could avoid returning a 8-character word.
Sorry if it's confusing, just bad data that I'm trying to clean up.
Thanks so much!
Sure, to get an uppercase followed by 6 characters followed by an uppercase, no spaces, you would want the expression to be something like
([A-Z]{1}\w{6}[A-Z]{1})
And if I want to change the 2nd to the last character to be a number? In my data, the word "AMREICAN" happens to be in the situation that I mentioned before and it's returning "AMERICAN" instead of the ID... Otherwise, the updated expression worked for all other instances.
Got it, so in that case you're on the right track to make it a little more specific. Could just modify that expression to:
([A-Z]{1}\w{5}[0-9]{1}[A-Z]{1})
Exactly what I wanted to achieve! Thanks so much, I really appreciate it!
Hi,
I am struggling to extract some IDs from a string. The IDs are in a specific format: Upper case letter, upper case letter with 7 Numbers after no spaces.
I am struggling to extract some IDs from a string please? thank you