Hi SME,
In the attachment, I have a dataset where column C (Value) has a bunch of string with embedded digits. All my rows have embedded digits at the end except rows 155 and 156 where the numbers are embedded in between the pipes ("|").
This is the RegEx that I used to peel the number at the end: (\d+$). How can I modify this so that any rows such 3 or 4 would pick up 1.000000 in the RegEx output and for rows 155/156, would yield the numbers embedded in between the pipes (196266 and 1.000000, respectively)? I've fixed the problem using the Formula tool right after the RegEx too. I would like to see if it's possible to avert the Formula tool in this case with the right RegEx script.
Thanks,
kwl
Solved! Go to Solution.
The following RegEx seems to mostly work: (-?\d+(\.\d+)?)
However, I need to modify this to do two things: one, to ignore the "-" in front of any number and two, to add a period for anything that has 6 digits. I can certainly use the Cleansing tool to get rid of any characters and the Formula tool to add a period in front of any 6 digits, but I would like to see if this can be done in RegEx. Appreciate your help. Thank you!
I'm not sure that you can use The RegEx Tool to dynamically add a period in some cases. You could opt for REGEX_Replace in a Formula Tool to cut down on tool usage:
First expression: REGEX_Replace([Value],"^.*?-?(\d*\.?\d{2,})(?:[^0-9].*$|$)", "$1")
Second: IIF(Length([Regex_Out])=6,"."+[Regex_Out],[Regex_Out])
Hope this helps and Happy Solving!
@knnwndlm one way of doing this
@CoG / @binuacs - I have a similar kind of problem that I'm trying to figure out the best approach. I have column C with a string that has all kinds of gibberish with amounts that I want to peel out. One way to do is to use the Filter tool to get rid of anything that has the word "cost" in it and the Text to Columns tool to get the number. Is there a RegEx approach that might be better?
Thanks,
kwl
@knnwndlm here you go
Regex explanation
--------------------
.*: Matches any characters before the monetary value.
(\$[0-9,]*\.?[0-9]{0,2}): Captures a monetary value (a dollar sign followed by digits, optional commas, optional decimal point, and optional cents).
\s?: Matches an optional whitespace character following the monetary value.
.*: Matches any characters after the monetary value.
explanation of the first regex (?:\|(-?\d*\.?\d+)|\.\s*(-?\d*\.?\d+)|\s(-?\d*\.?\d+))
(?: ... ): This is a non-capturing group that contains three alternatives separated by | (pipe character, dot with optional whitespace, and whitespace).
\|(-?\d*\.?\d+): Matches a numerical value following a pipe character and captures it.
\.\s*(-?\d*\.?\d+): Matches a numerical value following a dot and optional whitespace characters, and captures it.
\s(-?\d*\.?\d+): Matches a numerical value following a whitespace character and captures it.