Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx - Extract a Variable Dollar Value N Characters from a Given Word

hellyars
13 - Pulsar

I tried this once before.  I took it down by "solving it", but it was not solved.  It was a hot mess.  I now better understand what I need.

 

I need to extract the dollar value of the string value that is immediately N characters from the word "awarded."  The string value will be >$1,000,000.  The text may include other values, but I only want to extract the first string value immediately N characters from "awarded."  

 

I do not know how to adjust for the variability in the number of characters or in the value of the award nor how to convert it to a dollar value from a string. 

 

I know it has to start. ^.*\bawarded\s. ...

 

Examples

 

Lorem ipsum dolor sit amet, consectetur adipiscing awarded an $8,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris $28,069,336 in aliquam sem fringilla. 

 OR something like this

Lorem ipsum dolor sit amet, consectetur adipiscing awarded to my best friend $9,998,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris $28,069,336 in aliquam sem fringilla.

THANKS!

12 REPLIES 12
hellyars
13 - Pulsar

A quick follow up.   This is the current expression.  \bawarded\s.*?([0-9]{7,})

 

I use a formula tool to remove string value's $ sign and commas.  The expression looks for a series of numbers >7 N characters after awarded.  

 

BUT, I am stuck on one small detail.  Some text may have a second occurrence of awarded.  How do I stop the expression after it finds the first awarded and value pair? 

 

AND while it works on regex101.com, it actually does not work at all in Alteryx.  UGH. 

estherb47
15 - Aurora
15 - Aurora

Hello @hellyars 

 

Are you open to a non-RegEx solution? Thought of this approach quickly, since you just want to pull numbers.

 

You can remove all text, punctuation, and extra spaces with the data cleansing. Then a few manipulations. A formula tool gives you only the first number in each row (and a filter tool makes sure the numbers are >=1000000

 

image.png

 

Cheers!

Esther

estherb47
15 - Aurora
15 - Aurora

Please let me know if this works for you, because it's a single tool solution!

 

You can use a multi-field formula tool to get just the first set of numbers (and change the data type to a number(. For whatever reason, the Replace stops after the first number.

image.png

jdunkerley79
ACE Emeritus
ACE Emeritus

I suggest:

ToNumber(Replace(REGEX_Replace([Input], "^.*?\bawarded\b.*?\$([0-9,]{6,}).*?$|^.*()$", "$1"),",",""))

Breaking it down the core regex:

^.*?\bawarded\b.*?\$([0-9,]{6,}).*?$

Matches the awarded amount. Using non-greedy ways to match means it should extract whole number correctly into $1

 

This part:

 

|^.*()$

is a trick to mean that if the string does not match then $1 will be empty.

 

After this a traditional REPLACE is used to remove commas and then ToNumber to convert to a number

 

Sample attached.

hellyars
13 - Pulsar

@estherb47 

 

Thank you.  This works, but I need it to key off the word awarded.   An awarded value will always exist.  If I can get this to work, I can apply it to other keyword value pairs that are less frequently used.  

hellyars
13 - Pulsar

@jdunkerley79 

 

Brilliant.  That works in one fell swoop.  

 

I tested it by adding multiple award + value pairs to the text.  It always matches the first pair.   It looks like ^.*? up front is the key to matching the first occurrence.  It also worked when I added it to the expression I had been toiling with earlier.   Why do you use the second \b in" \bawarded\b" ?

 

THANK YOU!!!

Thableaus
17 - Castor
17 - Castor

Hi @hellyars 

 

This is a shorter solution:

ToNumber(Replace(REGEX_Replace([Field1], ".*(?<!awarded)awarded.*?\$([\d,]+).*", "$1"), ",", ""))

 

Using negative lookbehind, you assure that you hit your first awarded word.

Then you use non-greedy till the first "$", and group what you want to capture (digits or commas - full number).

Replace commas by nothing and convert the expression to a number.

 

Cheers,

estherb47
15 - Aurora
15 - Aurora

@jdunkerley79 got to this before I did, and his solution is great. Glad we were able to solve this for you!!!

 

You can put the ToNumber and Replace functions in the same formula tool, after the Regex_Replace. Although you can string both together, I find it easier to separate them.

 

Cheers!

Esther

jdunkerley79
ACE Emeritus
ACE Emeritus

I used the second \b to allow for something like 'awarded: blah' to match but ' awardeds ' would not.

 

Depends a lot on the input exactly what regex would be best but glad solution worked for you.

Labels