Alteryx Designer Desktop Discussions

jannis005 · ‎01-16-2020

Hi -

I am using Regex in my workflow to separate a few key identifiers in my dataset. The structure of the data has a few different formats, but the identifiers are consistently delimited by an underscore "_". See below for the different ways the identifiers present themselves. I am having trouble writing an expression that would be applicable to all scenarios. Can anyone offer some advice? Thanks.

Scenario 1: XXX_XXX_XXX

Scenario 2: XXXX_XXX_XXX

Scenario 3: extrainformationandtext(XXX_XXX_XXX)extrainformationandtext

The data groupings (XXX) vary from 3-6 or more, always separated by the underscore. The expression that has worked for the most datasets has been ".*\(?(.{3,})_(.{3,})_(.{3,}).*" but this does not capture the first grouping entirely if its more than 3 characters long.

ConnorK · ‎01-16-2020

Hi @jannis005 ,

What types of characters can the X's be?

Connor Kelleher
Senior Sales Engineer
Alteryx

jannis005 · ‎01-16-2020

Alphanumeric characters.

Jonathan-Sherman · ‎01-16-2020

Hi @jannis005,

This expression should work:

(\w{3,})_(\w{3,})_(\w{3,})

If this solves your issue please mark the answer as correct, if not let me know!

Regards,

Jonathan

jannis005 · ‎01-16-2020

I had an issue with the following format:

XXXXXXXXX XXXX XXXXXXXX XXXX - XXXXXXXX XX (XXX_XXX_XXX)

It works fine in Regex101.com but for some reason Alteryx does not give the expected response.

Thanks.

jannis005 · ‎01-16-2020

Never mind - the solution worked for me once I used the "RegEx" tool and not embedded in a Formula tool - appreciate your response!

Jonathan-Sherman · ‎01-16-2020

I was surprised when you said it didn't work, the expression looked fine to me. You can't parse multiple capture groups in a formula tool, the only way is to use a Regex_Replace function and reference the capture groups through using '$1' to bring in capture group 1 for example.

Alteryx Designer Desktop Discussions

Regex Grouping Question