Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex tool Parse method using non-greedy matching when not specified?

Jeremy
8 - Asteroid

I'm hoping some expert on Regex in Alteryx will see this and tell me what I'm doing wrong.

 

I cannot figure out how to make Alteryx parse out strings I'm asking of it.  I can do this easily in other tools, but it appears Alteryx is using non-greedy matching (i.e. taking the minimum number of characters to match a string) even when I don't specify it.

 

Consider the string "0 YR/2052 2,800,000.00"

 

Using the Regex tool in Parse mode, this: .*((\d{1,3},)*\d{1,3}\.\d\d).* returns only "0.00" which are the minimum number of characters needed to match the regular expression.  The exact same expression in other tools, including online Regex parsers, returns the expected "2,800,000.00"

 

Example of it working here: regexr.com/7cli2

 

The problem I'm trying to solve is a field with bogus data mixed with numbers I want.  I need the "\d\.\d\d" at the end of the string to be required because all valid numbers always have at least 0.01 and any number with two decimals up to 1 billion is valid.  Part of the bogus data could be numbers, so I can't use RegexReplace() to get rid of all non-numbers, I have to find matching strings without spaces and with decimals that are properly formatted as a number with thousands separators.

7 REPLIES 7
Yoshiro_Fujimori
15 - Aurora

Hi @Jeremy ,

I am not an expert of RegEx at all, but how do you thnk about this expression?

 

REGEX_Replace([Data], ".*?([\d,]+\.\d{2}).*", "$1")

 

Yoshiro_Fujimori_0-1682384311299.png

 

RobertOdera
13 - Pulsar

Hi, @Jeremy 

 

At the risk of going too "low tech," is there a pattern to the value you always want?

 

If, like in this example, it is always the third "word," then consider using GetWord([Field1], 2), else I would defer to @Yoshiro_Fujimori solution - Cheers!

Jeremy
8 - Asteroid

Thanks for the idea, @Yoshiro_Fujimori .  Unfortunately, that fails as well.  Other problem fields include "0001" which your expression matches and is invalid data.

 

I have tried my expression as well like this: Regex_Replace([Data], "(\d{1,3},)*\d{1,3}\.\d\d", "$1") and that fails in new and exciting ways.  I think my problem here is I need to use groups with () to match my optional groups of "\d{1,3}," but Alteryx is using () to denote a marked group.  In the Regex tool in Parse mode, I'll just get extra fields that I can throw away, but doing this with Regex_Replace and specifying a group doesn't work.

Jeremy
8 - Asteroid

Thanks, @RobertOdera , but no, the data is a mess and can contain any number of other words and characters or none at all.  Fortunately, the pattern I'm looking for always has at least a leading and a trailing space.  It's a perfect application for regular expressions, I just can't make Alteryx do what I need.

Yoshiro_Fujimori
15 - Aurora

Hi @Jeremy ,

 

I also tried RegEx tool, and it seems to work.

If it does not work on your case, could you give me a sample data where it does not work?

 

Input Data

0 YR/2052 2,800,000.00 0 YR/2052 2,800,000.00 0 YR/2052 2,800,000.00 0 YR/2052 2,800,000.00 0 YR/2052 2,800,000.00

 

RegEx configuration

Yoshiro_Fujimori_1-1682386006903.png

 

Output Data

Yoshiro_Fujimori_0-1682385988188.png

 

 

RobertOdera
13 - Pulsar

Ah, okay, got it @Jeremy 

 

For now, since you talked about a pattern to identify the rows that need review, consider the below:

Tomorrow AM, I will chase the rabbit on the Regex - do you have a set of "known" exceptions that the Regex Replace is running up against? maybe, to start, we can treat for those (and update as new rabbits pop up - what's up, doc...hehehe😎)

 

RobertOdera_0-1682386249080.png

 

Jeremy
8 - Asteroid

@Yoshiro_Fujimori - I did get your method working with a slight modification.  I could have sworn I tried this last night, but it was very late for me, so I no doubt made a mistake.

 

Attached is a flow with some sample data and the three primary methods I've tried.  The top should work but doesn't, your expression using the RegEx tool in Parse mode, and the original Regex_Replace method, which does not work but really should based on the documentation.

 

In case it matters, I'm currently using Alteryx Designer 2022.3.1.430

Labels