Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Find String Value N Characters from "Awarded" and Extract Dollar Value

hellyars
13 - Pulsar

I am struggling with a few RegEx expressions today.

 

I want to extract the award value of a contract.   The (string) value is always in millions.   It is always preceded by "awarded",  "awarded a" or "awarded an."   I want to extract the dollar value and convert it to a number.  

 

How can I do this.  Here is a similar example I found in the discussions Finding String Value and Extracting Dollar Value , but I am struggling a bit - especially with the variance in the phrase that precedes the value I am trying to extract. 

 

See examples below.

 

Thanks.

 

Lorem ipsum dolor sit amet, consectetur adipiscing awarded an $8,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris in aliquam sem fringilla. 
Lorem ipsum dolor sit amet, consectetur adipiscing awarded a $288,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris in aliquam sem fringilla. 
Lorem ipsum dolor sit amet, consectetur adipiscing awarded $18,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris in aliquam sem fringilla.  

 

5 REPLIES 5
hellyars
13 - Pulsar

I am trying to crash my way through this one.  I am using regex101.com.

 

So far, I think I am heading in the right direction with the following

 

^.*\bawarded\s.{0,4}(\d+,\d{3},\d{3})

But the {0,4} or {0,N} range does not offer enough flexibility given the combination of spaces and dollar values. 

hellyars
13 - Pulsar

Still trying.  This seems to capture the string values.  But, it captures all the string values not just the one N from awarded.

 

^.*\bawarded\s.+\b$|(\d+,\d{3},\d{3})

 

hellyars
13 - Pulsar

Okay...

 

I have it working on a fixed level.   But, I need a more flexible solution.  I discovered that at times there might be a few words between the awarded and the string value.  For example, it might read "awarded a not-to-exceed $111,111,111."  So, I need an expression that skips over the spaces and characters between award and the first string value.

 

Here is the partial solution, but not the dynamic solution.

 

.*\bawarded\s.{1}(\d+,\d{3},\d{3})|^.*awarded\s.{3}(\d+,\d{3},\d{3})|^.*awarded\s.{4}(\d+,\d{3},\d{3})

 

 

ChrisTX
16 - Nebula
16 - Nebula

I'm not a RegEx expert, but try this:

 

.*(award)([^\d]*)([\d,]*)

 

 

hellyars
13 - Pulsar

@ChrisTX 

 

Thanks.  This path ended up being a dead end.  But, it did highlight a few variables in my source data that I had not factored.  He is the updated post / question.

 

 

 

 

Labels