Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Register for the upcoming Live Community Q&A Session - and don't forget to submit your questions for @DeanS regarding the future role of analytics here.
SOLVED

RegEx - Extract a Variable Dollar Value N Characters from a Given Word

Highlighted
11 - Bolide

I tried this once before.  I took it down by "solving it", but it was not solved.  It was a hot mess.  I now better understand what I need.

 

I need to extract the dollar value of the string value that is immediately N characters from the word "awarded."  The string value will be >$1,000,000.  The text may include other values, but I only want to extract the first string value immediately N characters from "awarded."  

 

I do not know how to adjust for the variability in the number of characters or in the value of the award nor how to convert it to a dollar value from a string. 

 

I know it has to start. ^.*\bawarded\s. ...

 

Examples

 

Lorem ipsum dolor sit amet, consectetur adipiscing awarded an $8,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris $28,069,336 in aliquam sem fringilla. 

 OR something like this

Lorem ipsum dolor sit amet, consectetur adipiscing awarded to my best friend $9,998,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris $28,069,336 in aliquam sem fringilla.

THANKS!

Highlighted
11 - Bolide

A quick follow up.   This is the current expression.  \bawarded\s.*?([0-9]{7,})

 

I use a formula tool to remove string value's $ sign and commas.  The expression looks for a series of numbers >7 N characters after awarded.  

 

BUT, I am stuck on one small detail.  Some text may have a second occurrence of awarded.  How do I stop the expression after it finds the first awarded and value pair? 

 

AND while it works on regex101.com, it actually does not work at all in Alteryx.  UGH. 

Highlighted
14 - Magnetar
14 - Magnetar

Hello @hellyars 

 

Are you open to a non-RegEx solution? Thought of this approach quickly, since you just want to pull numbers.

 

You can remove all text, punctuation, and extra spaces with the data cleansing. Then a few manipulations. A formula tool gives you only the first number in each row (and a filter tool makes sure the numbers are >=1000000

 

image.png

 

Cheers!

Esther

Highlighted
14 - Magnetar
14 - Magnetar

Please let me know if this works for you, because it's a single tool solution!

 

You can use a multi-field formula tool to get just the first set of numbers (and change the data type to a number(. For whatever reason, the Replace stops after the first number.

image.png

Highlighted
16 - Nebula
16 - Nebula

I suggest:

ToNumber(Replace(REGEX_Replace([Input], "^.*?\bawarded\b.*?\$([0-9,]{6,}).*?$|^.*()$", "$1"),",",""))

Breaking it down the core regex:

^.*?\bawarded\b.*?\$([0-9,]{6,}).*?$

Matches the awarded amount. Using non-greedy ways to match means it should extract whole number correctly into $1

 

This part:

 

|^.*()$

is a trick to mean that if the string does not match then $1 will be empty.

 

After this a traditional REPLACE is used to remove commas and then ToNumber to convert to a number

 

Sample attached.

Highlighted
11 - Bolide

@EstherB47 

 

Thank you.  This works, but I need it to key off the word awarded.   An awarded value will always exist.  If I can get this to work, I can apply it to other keyword value pairs that are less frequently used.  

Highlighted
11 - Bolide

@jdunkerley79 

 

Brilliant.  That works in one fell swoop.  

 

I tested it by adding multiple award + value pairs to the text.  It always matches the first pair.   It looks like ^.*? up front is the key to matching the first occurrence.  It also worked when I added it to the expression I had been toiling with earlier.   Why do you use the second \b in" \bawarded\b" ?

 

THANK YOU!!!

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

Hi @hellyars 

 

This is a shorter solution:

ToNumber(Replace(REGEX_Replace([Field1], ".*(?<!awarded)awarded.*?\$([\d,]+).*", "$1"), ",", ""))

 

Using negative lookbehind, you assure that you hit your first awarded word.

Then you use non-greedy till the first "$", and group what you want to capture (digits or commas - full number).

Replace commas by nothing and convert the expression to a number.

 

Cheers,

Highlighted
14 - Magnetar
14 - Magnetar

@jdunkerley79 got to this before I did, and his solution is great. Glad we were able to solve this for you!!!

 

You can put the ToNumber and Replace functions in the same formula tool, after the Regex_Replace. Although you can string both together, I find it easier to separate them.

 

Cheers!

Esther

Highlighted
16 - Nebula
16 - Nebula

I used the second \b to allow for something like 'awarded: blah' to match but ' awardeds ' would not.

 

Depends a lot on the input exactly what regex would be best but glad solution worked for you.

Labels