Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx to Parse Out Digits

knnwndlm
8 - Asteroid

Hi SME,

 

In the attachment, I have a dataset where column C (Value) has a bunch of string with embedded digits.  All my rows have embedded digits at the end except rows 155 and 156 where the numbers are embedded in between the pipes ("|").  

 

This is the RegEx that I used to peel the number at the end:  (\d+$).  How can I modify this so that any rows such 3 or 4 would pick up 1.000000 in the RegEx output and for rows 155/156,  would yield the numbers embedded in between the pipes (196266 and 1.000000, respectively)?  I've fixed the problem using the Formula tool right after the RegEx too.  I would like to see if it's possible to avert the Formula tool in this case with the right RegEx script.

 

Thanks,

kwl

13 REPLIES 13
knnwndlm
8 - Asteroid

This is in essence what I'm trying to get.

knnwndlm
8 - Asteroid

The following RegEx seems to mostly work:  (-?\d+(\.\d+)?)  

 

However, I need to modify this to do two things:  one, to ignore the "-" in front of any number and two, to add a period for anything that has 6 digits.  I can certainly use the Cleansing tool to get rid of any characters and the Formula tool to add a period in front of any 6 digits, but I would like to see if this can be done in RegEx.  Appreciate your help.  Thank you!

CoG
14 - Magnetar

I'm not sure that you can use The RegEx Tool to dynamically add a period in some cases. You could opt for REGEX_Replace in a Formula Tool to cut down on tool usage:

Screenshot.png

 

First expression: REGEX_Replace([Value],"^.*?-?(\d*\.?\d{2,})(?:[^0-9].*$|$)", "$1")

Second: IIF(Length([Regex_Out])=6,"."+[Regex_Out],[Regex_Out])

 

Hope this helps and Happy Solving!

binuacs
21 - Polaris

@knnwndlm one way of doing this

image.png

knnwndlm
8 - Asteroid

@CoG - Thank you!

 

@binuacs - Could you please help me understand the RegEx?  Thank you!

knnwndlm
8 - Asteroid

@CoG / @binuacs - I have a similar kind of problem that I'm trying to figure out the best approach.  I have column C with a string that has all kinds of gibberish with amounts that I want to peel out.  One way to do is to use the Filter tool to get rid of anything that has the word "cost" in it and the Text to Columns tool to get the number.  Is there a RegEx approach that might be better?  

 

Thanks,

kwl

binuacs
21 - Polaris

@knnwndlm here you go

image.png

binuacs
21 - Polaris
Regex explanation
--------------------
.*: Matches any characters before the monetary value.
(\$[0-9,]*\.?[0-9]{0,2}): Captures a monetary value (a dollar sign followed by digits, optional commas, optional decimal point, and optional cents).
\s?: Matches an optional whitespace character following the monetary value.
.*: Matches any characters after the monetary value.
binuacs
21 - Polaris
explanation of the first regex (?:\|(-?\d*\.?\d+)|\.\s*(-?\d*\.?\d+)|\s(-?\d*\.?\d+))
(?: ... ): This is a non-capturing group that contains three alternatives separated by | (pipe character, dot with optional whitespace, and whitespace).
\|(-?\d*\.?\d+): Matches a numerical value following a pipe character and captures it.
\.\s*(-?\d*\.?\d+): Matches a numerical value following a dot and optional whitespace characters, and captures it.
\s(-?\d*\.?\d+): Matches a numerical value following a whitespace character and captures it.
Labels