Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Find number in a string

FreeRangeDingo
11 - Bolide
11 - Bolide

I am trying to use the Regex tool to find a word and a series of numbers.  

 

The column I am working with is a string column that is a "comment" column, meaning any text or numbers can be entered into the column.  I need to locate the word "Cluster" and then pull the 2 - 4 numbers after that.  The numbers will have a period.  Here are some examples...

 

Cluster 2.3

Cluster 11.1

Cluster 11.10

 

There may be text before or after the word "Cluster". 

 

After I find that, I need the first 1 - 2 numbers.  

 

I know this is possible with the regex tool.  I just don't know enough regex to write it.  Thanks in advance.

 

 

 

3 REPLIES 3
Thableaus
17 - Castor
17 - Castor

Hi @FreeRangeDingo 

 

You may try this REGEX.

 

REGEX_Replace([Field], ".*Cluster.*?(\d{1,2}).*", "$1")

 

Cheers,

FreeRangeDingo
11 - Bolide
11 - Bolide

With a filter or two applied before the expression, I can make this work.  Can you help me understand the syntax?

 

REGEX_Replace([perfstatus comment], ".*Cluster.*?(\d{1,2}).*", "$1")

 

.*  -- means zero or more of any character; I assume this is there to indicate that there could be text or characters before or after the word cluster.   

? -- means once or none

\d -- is matching a digit (as opposed to a non digit).

\d{1,2} -- means 1 to 2 digits (or 1 to 2 times) before the period

(\d{1,2}).* -- I think the parenthesis are grouping, but I don't know why they are needed.  

$ -- means end of string, but I don't know what "$1" means.

 

Thanks for the response!

 

Thableaus
17 - Castor
17 - Castor

@FreeRangeDingo 

 

You're close in those explanations.

 

.*  -- means zero or more of any character; I assume this is there to indicate that there could be text or characters before the word cluster.   - correct

.*? -- after the Cluster word, this means give me any character until it finds the next one (which is the digit). So it's non-greedy.

\d -- is matching a digit (as opposed to a non digit). - correct

\d{1,2} -- means 1 to 2 digits (or 1 to 2 times) before the period - correct

(\d{1,2}).* -- I think the parenthesis are grouping, but I don't know why they are needed. They are needed because you're using REGEX Replace. So basically, the first part of the REGEX is what you're replacing (in our case, we're replacing the whole string), and we need to group what we will use to replace this whole string.  

$1-- This is a reference to our first grouping. What does that mean? It means we're replacing our full string by (\d{1,2}) found in the middle of this string. $1 is a reference to the grouping we did using the parenthesis. If we had another set of parenthesis, then it could be referenced with $2.

 

Cheers,

 

Labels