Alteryx Designer Desktop Discussions

jaimonsk · ‎08-21-2019

Hi Experts,

I have a sample data like this

Test1@~^Test2@~^Test3@~^ where @~^ is the delimitter

I got a regular expression to tokenize this into the required columns .The expression is written like this (.*?)(?:@~\^)

Can someone explain the working of this?

I tried sampling this on https://regex101.com/ and found the matching as in the screenshot. Just now i got to know that .+c means greedy search and .+? means lazy search.

The thing which i'm wondering is how the expression understands the first grouping ends in Test1@~^ and not the full string? Also could someone explain the concept of unmarked grouping?

jdunkerley79 · ‎08-21-2019

The first part, (.*?), is a non-greedy match of all characters. This will match just enough for the expression to be true.

- The .* means any characters

- The ? makes this non-greedy

The second part, (?:@~\^), is a non-gathering match

- In other words, it matches from the start of a block until it finds a @~^ just ahead of it and just returns the bit before this.

In tokenise mode, the RegEx tool will return the first gathered group if there is one.

In this case, the slightly simpler, (.*?)@~\^, would work exactly the same.

jaimonsk · ‎08-21-2019

Thank you @jdunkerley79

🙂

Alteryx Designer Desktop Discussions

Explanation for a regular expression