Hi -
I am using Regex in my workflow to separate a few key identifiers in my dataset. The structure of the data has a few different formats, but the identifiers are consistently delimited by an underscore "_". See below for the different ways the identifiers present themselves. I am having trouble writing an expression that would be applicable to all scenarios. Can anyone offer some advice? Thanks.
Scenario 1: XXX_XXX_XXX
Scenario 2: XXXX_XXX_XXX
Scenario 3: extrainformationandtext(XXX_XXX_XXX)extrainformationandtext
The data groupings (XXX) vary from 3-6 or more, always separated by the underscore. The expression that has worked for the most datasets has been ".*\(?(.{3,})_(.{3,})_(.{3,}).*" but this does not capture the first grouping entirely if its more than 3 characters long.
Solved! Go to Solution.
Hi @jannis005 ,
What types of characters can the X's be?
Alphanumeric characters.
Hi @jannis005,
This expression should work:
(\w{3,})_(\w{3,})_(\w{3,})
If this solves your issue please mark the answer as correct, if not let me know!
Regards,
Jonathan
I had an issue with the following format:
XXXXXXXXX XXXX XXXXXXXX XXXX - XXXXXXXX XX (XXX_XXX_XXX)
It works fine in Regex101.com but for some reason Alteryx does not give the expected response.
Thanks.
Never mind - the solution worked for me once I used the "RegEx" tool and not embedded in a Formula tool - appreciate your response!
I was surprised when you said it didn't work, the expression looked fine to me. You can't parse multiple capture groups in a formula tool, the only way is to use a Regex_Replace function and reference the capture groups through using '$1' to bring in capture group 1 for example.