Hi Experts,
I have a sample data like this
Test1@~^Test2@~^Test3@~^ where @~^ is the delimitter
I got a regular expression to tokenize this into the required columns .The expression is written like this (.*?)(?:@~\^)
Can someone explain the working of this?
I tried sampling this on https://regex101.com/ and found the matching as in the screenshot. Just now i got to know that .+c means greedy search and .+? means lazy search.
The thing which i'm wondering is how the expression understands the first grouping ends in Test1@~^ and not the full string? Also could someone explain the concept of unmarked grouping?
Solved! Go to Solution.
The first part, (.*?), is a non-greedy match of all characters. This will match just enough for the expression to be true.
- The .* means any characters
- The ? makes this non-greedy
The second part, (?:@~\^), is a non-gathering match
- In other words, it matches from the start of a block until it finds a @~^ just ahead of it and just returns the bit before this.
In tokenise mode, the RegEx tool will return the first gathered group if there is one.
In this case, the slightly simpler, (.*?)@~\^, would work exactly the same.
Thank you @jdunkerley79
🙂