ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

RegEx - Extract a Variable Dollar Value N Characters from a Given Word

hellyars
12 - Quasar

I tried this once before.  I took it down by "solving it", but it was not solved.  It was a hot mess.  I now better understand what I need.

 

I need to extract the dollar value of the string value that is immediately N characters from the word "awarded."  The string value will be >$1,000,000.  The text may include other values, but I only want to extract the first string value immediately N characters from "awarded."  

 

I do not know how to adjust for the variability in the number of characters or in the value of the award nor how to convert it to a dollar value from a string. 

 

I know it has to start. ^.*\bawarded\s. ...

 

Examples

 

Lorem ipsum dolor sit amet, consectetur adipiscing awarded an $8,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris $28,069,336 in aliquam sem fringilla. 

 OR something like this

Lorem ipsum dolor sit amet, consectetur adipiscing awarded to my best friend $9,998,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris $28,069,336 in aliquam sem fringilla.

THANKS!

hellyars
12 - Quasar

A quick follow up.   This is the current expression.  \bawarded\s.*?([0-9]{7,})

 

I use a formula tool to remove string value's $ sign and commas.  The expression looks for a series of numbers >7 N characters after awarded.  

 

BUT, I am stuck on one small detail.  Some text may have a second occurrence of awarded.  How do I stop the expression after it finds the first awarded and value pair? 

 

AND while it works on regex101.com, it actually does not work at all in Alteryx.  UGH. 

estherb47
15 - Aurora
15 - Aurora

Hello @hellyars 

 

Are you open to a non-RegEx solution? Thought of this approach quickly, since you just want to pull numbers.

 

You can remove all text, punctuation, and extra spaces with the data cleansing. Then a few manipulations. A formula tool gives you only the first number in each row (and a filter tool makes sure the numbers are >=1000000

 

image.png

 

Cheers!

Esther

estherb47
15 - Aurora
15 - Aurora

Please let me know if this works for you, because it's a single tool solution!

 

You can use a multi-field formula tool to get just the first set of numbers (and change the data type to a number(. For whatever reason, the Replace stops after the first number.

image.png

jdunkerley79
16 - Nebula
16 - Nebula

I suggest:

ToNumber(Replace(REGEX_Replace([Input], "^.*?\bawarded\b.*?\$([0-9,]{6,}).*?$|^.*()$", "$1"),",",""))

Breaking it down the core regex:

^.*?\bawarded\b.*?\$([0-9,]{6,}).*?$

Matches the awarded amount. Using non-greedy ways to match means it should extract whole number correctly into $1

 

This part:

 

|^.*()$

is a trick to mean that if the string does not match then $1 will be empty.

 

After this a traditional REPLACE is used to remove commas and then ToNumber to convert to a number

 

Sample attached.

hellyars
12 - Quasar

@estherb47 

 

Thank you.  This works, but I need it to key off the word awarded.   An awarded value will always exist.  If I can get this to work, I can apply it to other keyword value pairs that are less frequently used.  

hellyars
12 - Quasar

@jdunkerley79 

 

Brilliant.  That works in one fell swoop.  

 

I tested it by adding multiple award + value pairs to the text.  It always matches the first pair.   It looks like ^.*? up front is the key to matching the first occurrence.  It also worked when I added it to the expression I had been toiling with earlier.   Why do you use the second \b in" \bawarded\b" ?

 

THANK YOU!!!

Thableaus
17 - Castor
17 - Castor

Hi @hellyars 

 

This is a shorter solution:

ToNumber(Replace(REGEX_Replace([Field1], ".*(?<!awarded)awarded.*?\$([\d,]+).*", "$1"), ",", ""))

 

Using negative lookbehind, you assure that you hit your first awarded word.

Then you use non-greedy till the first "$", and group what you want to capture (digits or commas - full number).

Replace commas by nothing and convert the expression to a number.

 

Cheers,

estherb47
15 - Aurora
15 - Aurora

@jdunkerley79 got to this before I did, and his solution is great. Glad we were able to solve this for you!!!

 

You can put the ToNumber and Replace functions in the same formula tool, after the Regex_Replace. Although you can string both together, I find it easier to separate them.

 

Cheers!

Esther

jdunkerley79
16 - Nebula
16 - Nebula

I used the second \b to allow for something like 'awarded: blah' to match but ' awardeds ' would not.

 

Depends a lot on the input exactly what regex would be best but glad solution worked for you.

Labels