Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Explanation for a regular expression

jaimonsk
8 - Asteroid

Hi Experts,

 

I have a sample data like this

Test1@~^Test2@~^Test3@~^   where @~^ is the delimitter

I got a regular expression to tokenize this into the required columns .The expression is written like this (.*?)(?:@~\^)

 

Can someone explain the working of this?

 

I tried sampling this on https://regex101.com/ and found the matching as in the screenshot. Just now i got to know that .+c means greedy search and .+? means lazy search.

clipboard_image_0.png

 

The thing which i'm wondering is how the expression understands the first grouping ends in Test1@~^  and not the full string? Also could someone explain the concept of unmarked grouping?

2 REPLIES 2
jdunkerley79
ACE Emeritus
ACE Emeritus

The first part, (.*?), is a non-greedy match of all characters. This will match just enough for the expression to be true.

 - The .* means any characters

 - The ? makes this non-greedy

 

The second part, (?:@~\^), is a non-gathering match

 - In other words, it matches from the start of a block until it finds a @~^ just ahead of it and just returns the bit before this.

 

In tokenise mode, the RegEx tool will return the first gathered group if there is one.

 

In this case, the slightly simpler, (.*?)@~\^, would work exactly the same.

 

jaimonsk
8 - Asteroid

Thank you @jdunkerley79 

🙂

Labels