Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

REGEX_REPLACE - Please explain why this works

afinzerillo
8 - Asteroid

By lots of internet searches, and dumb luck, I wrote a REGEX_REPLACE formula that does exactly what I want.  I'm hoping someone can explain why it works.

 

I wanted to evaluate a string containing 0-x underscores.  If there were less than 2, then return everything to the left of the first underscore.  If there were more than 2, then return everything to the left of the second to last underscore.  

 

So, if the string was "abc_def", then I wanted to return "abc".  

If it was "abc_def_hij_klm", then I only wanted "abc_def".

 

I did it using the following formula:

if REGEX_CountMatches([text_in], "_") < 2 then
REGEX_Replace([text_in], "(.+)_(.+)","$1") else
REGEX_Replace([text_in], "(.+)_(.+_.+)","$1") endif

 

I understand the if REGEX_CountMatches([text_in], "_") < 2 part.  What I don't understand is why the last two lines work like they do.  Specifically, why/how does the pattern work?  

 

I have attached the workflow.  

4 REPLIES 4
atcodedog05
22 - Nova
22 - Nova

Hi @afinzerillo 

 

I am guessing you expected is as below.

 

Workflow:

atcodedog05_0-1633105794360.png

 

if REGEX_CountMatches([text_in], "_") < 2 then

// If there is only single underscore get the word before underscore which is by (.+)_ part

REGEX_Replace([text_in], "(.+)_(.+)","$1")

else

// If there is more than one underscore get the word before first underscore and after first underscore which is by ([^_]+_[^_]+)_ part. [^_]+ say any character expect underscore.

REGEX_Replace([text_in], "([^_]+_[^_]+)_(.+)","$1")

endif

 

 

Hope this helps : )

afinzerillo
8 - Asteroid

@atcodedog05 

 

I expected the output as written and produced in the workflow I uploaded.  See below.

Notice how "abc_def_efg_hij_klm_nop_qrs_tuv_wxy_z12" is returned as "abc_def_efg_hij_klm_nop_qrs_tuv".  I get everything to the left of the second to last underscore.  the syntax I wrote, "(.+)_(.+_.+)" works like I want it no matter how many underscores there are in the string.  I just don't know how Alteryx is processing "(.+)_(.+_.+)" to make it happen.

 

 

afinzerillo_0-1633106286168.png

 

atcodedog05
22 - Nova
22 - Nova

Hi @afinzerillo 

 

In REGEX_Replace([text_in], "(.+)_(.+_.+)","$1") formula .+ means one or more any characters. Hence whats Regex is doing is its trying to fit the max in the first bracket such that it still maintains pattern (string)_(string_string) this is how the Regex works it will try to fit the max in the first occurrence of the pattern also keeping into the consideration that it meets the whole pattern.

 

Hope this helps : )

Garabujo7
Alteryx
Alteryx

Hello @afinzerillo ,

 

I'll recommend that you take this interactive tutorial about regex to understand what each piece does:

https://regexone.com/

It's been very useful for me.

 

Garabujo7_0-1633107841043.jpeg

 

Also another useful resource is this interactive regex tester

 

https://regex101.com/

 

Hope that helps

Gabriel

 

 

Labels