By lots of internet searches, and dumb luck, I wrote a REGEX_REPLACE formula that does exactly what I want. I'm hoping someone can explain why it works.
I wanted to evaluate a string containing 0-x underscores. If there were less than 2, then return everything to the left of the first underscore. If there were more than 2, then return everything to the left of the second to last underscore.
So, if the string was "abc_def", then I wanted to return "abc".
If it was "abc_def_hij_klm", then I only wanted "abc_def".
I did it using the following formula:
if REGEX_CountMatches([text_in], "_") < 2 then
REGEX_Replace([text_in], "(.+)_(.+)","$1") else
REGEX_Replace([text_in], "(.+)_(.+_.+)","$1") endif
I understand the if REGEX_CountMatches([text_in], "_") < 2 part. What I don't understand is why the last two lines work like they do. Specifically, why/how does the pattern work?
I have attached the workflow.
Hi @afinzerillo
I am guessing you expected is as below.
Workflow:
if REGEX_CountMatches([text_in], "_") < 2 then
// If there is only single underscore get the word before underscore which is by (.+)_ part
REGEX_Replace([text_in], "(.+)_(.+)","$1")
else
// If there is more than one underscore get the word before first underscore and after first underscore which is by ([^_]+_[^_]+)_ part. [^_]+ say any character expect underscore.
REGEX_Replace([text_in], "([^_]+_[^_]+)_(.+)","$1")
endif
Hope this helps : )
I expected the output as written and produced in the workflow I uploaded. See below.
Notice how "abc_def_efg_hij_klm_nop_qrs_tuv_wxy_z12" is returned as "abc_def_efg_hij_klm_nop_qrs_tuv". I get everything to the left of the second to last underscore. the syntax I wrote, "(.+)_(.+_.+)" works like I want it no matter how many underscores there are in the string. I just don't know how Alteryx is processing "(.+)_(.+_.+)" to make it happen.
Hi @afinzerillo
In REGEX_Replace([text_in], "(.+)_(.+_.+)","$1") formula .+ means one or more any characters. Hence whats Regex is doing is its trying to fit the max in the first bracket such that it still maintains pattern (string)_(string_string) this is how the Regex works it will try to fit the max in the first occurrence of the pattern also keeping into the consideration that it meets the whole pattern.
Hope this helps : )
Hello @afinzerillo ,
I'll recommend that you take this interactive tutorial about regex to understand what each piece does:
It's been very useful for me.
Also another useful resource is this interactive regex tester
Hope that helps
Gabriel