Hi Designers,
We would like to extract just the portion in bold from the below string.
1) The portion below sometimes does not have the '-XX' indicator (Ie it is 12 characters)
2) The portion below is not always at the beginning of the string
We found some RegEx links, but are fairly new and not exactly sure how to apply what we've found to our specific case, below is a close example of our situation.
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/REGex-Replace/td-p/395971
Example: G90000607960-14 GSK Helena Auguston 3 and 4 Sep 2019
Any ideas?
Solved! Go to Solution.
Hi @pmeloche
Will you always be extracting everything before the first space? If yes, I recommend a Text to Columns tool. Set the delimiter to a space (\s), and separate into 2 columns. The first will have the information you want.
Let me know if that helps!
Cheers,
Esther
Hi @pmeloche,
If what @estherb47 is suggesting doesn't work with your data, try the following expression in a RegEx parse: .*?(\w\d{11}\-*\d*)
This will work if you substring is always a letter followed by 11 digits with the optional dash and two digits. This would require modification if your substring is of variable length or format.
Had a thought on the RegEx approach. We can (also) make the -XX optional using this expression in a RegEx Tool set to Parse:
(\u\d{11}(?:-\d{2})?)
Brief explanation:
( # Begin capture group
\u # Match one upper case character
\d{11} # Match 11 digits
(?: # Begin nested non-capture group
- # Match the -
\d{2} # Match two digits
) # End nested non-capture group
? # Make nested non-capture group optional (catch it if you can!)
) # End capture group.
Example with tests is attached.
THIS IS EXACTLY WHAT WE NEEDED!!! Fantastic work! Yes i'm yelling because this is super exciting!
Thank you SO MUCH!