Hi All,
I am trying to understand the difference between the Regex tool behavior of match and parse.
When I use match, it considers the entire pattern and evaluates if it matches or not but when it comes to parse it doesn't seem to be doing that.
If you focus on the string in 5th row (Image 1) which is 'Furniture-ABC-1234' it doesn't match the regex pattern '[A-Za-z]*-(ABC)|XYZ-\d{3,4}' or '[A-Za-z]*-ABC|XYZ-\d{3,4}' and the match function shows the same. But when I put a parse function (Image 2) on it is able to capture 'ABC'. But, why?
I thought that it might start from left to right and if it matches till the capturing part then it should capture the required part and gives the output. So going by that logic I tried to capture 'ABC' by using the pattern '[A-Za-z]*-(ABC)-hkujbkghjv' (Image 3). If my assumption would have been correct it should be able to capture in this as well, but this time it didn't.
Then I thought it has something to do with the pipe symbol (OR symbol), so I used the pattern '[A-Za-z]*-(ABC)|-hkujbkghjv' (Image 4) and this time again it was able to capture.
But I am not able to understand the precise logic and its workings, if only someone could clear this up. Even ChatGPT and copilot gave up on this.
These are totally different functions.
Regex_Match looks to match the entirety of a term vs a regex_expression. It matches or it doesn't match (boolean true/false). regex parse/replace looks to extract a marked group (or multiple marked groups) from a term. It matches if the marked group occurs and if the regex matches UP TO THE POINT OF THE MARKED GROUP. In your case the marked group (what it's looking for) is "(ABC)" These match at th epoint the marked group occurs - Regex sees this as a match. It provides you the value of your marked group - which in this case is a) static b) ABC.
and you are using | wrong. | looks at an or of statements not speciifc terms "(\d+hello)|(hello\d+)" - says any term either with numbers and then the word hello - or the word hello - and then numbers - for example.
Thanks for the reply but you said for it to be able to parse the conditions are
1. Marked group occurs
2. The regex matches UP TO THE POINT OF THE MARKED GROUP
But if only that would have been enough it should have been able to parse what is in the image 3 as well. What I think is that any portion of the string from between or starting or ending or anywhere should be matching the complete regex pattern, only then would it parse.
And I am aware how '|' is used, I intentionally placed it in this way to highlight the issue.
Hi @Lakshay_khanna,
This is an interesting one...
You're correct, it's down to the difference in how Parse and Match works. Parse will work from left to right and extract anything in capture groups it finds along the way until it no longer matches the pattern. Whilst match must match the entire string to pass.
In your example of the Furniture-ABC-1234 you'd need to wrap the ABC in a non-capturing group "?:" and XYZ in a non-capturing group to then treat the character grouping as either ABC or XYZ. In your example you'd need to use:
[A-Za-z]*-((?:ABC)|(?:XYZ))-\d{3,4}
This would extract the ABC or XYZ in parse AND match the entire string too. Happy to expand on this if you need some more info!
Kind regards,
Jonathan