I am struggling with a few RegEx expressions today.
I want to extract the award value of a contract. The (string) value is always in millions. It is always preceded by "awarded", "awarded a" or "awarded an." I want to extract the dollar value and convert it to a number.
How can I do this. Here is a similar example I found in the discussions Finding String Value and Extracting Dollar Value , but I am struggling a bit - especially with the variance in the phrase that precedes the value I am trying to extract.
See examples below.
Thanks.
Lorem ipsum dolor sit amet, consectetur adipiscing awarded an $8,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris in aliquam sem fringilla.
Lorem ipsum dolor sit amet, consectetur adipiscing awarded a $288,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris in aliquam sem fringilla.
Lorem ipsum dolor sit amet, consectetur adipiscing awarded $18,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris in aliquam sem fringilla.
Solved! Go to Solution.
I am trying to crash my way through this one. I am using regex101.com.
So far, I think I am heading in the right direction with the following
^.*\bawarded\s.{0,4}(\d+,\d{3},\d{3})
But the {0,4} or {0,N} range does not offer enough flexibility given the combination of spaces and dollar values.
Still trying. This seems to capture the string values. But, it captures all the string values not just the one N from awarded.
^.*\bawarded\s.+\b$|(\d+,\d{3},\d{3})
Okay...
I have it working on a fixed level. But, I need a more flexible solution. I discovered that at times there might be a few words between the awarded and the string value. For example, it might read "awarded a not-to-exceed $111,111,111." So, I need an expression that skips over the spaces and characters between award and the first string value.
Here is the partial solution, but not the dynamic solution.
.*\bawarded\s.{1}(\d+,\d{3},\d{3})|^.*awarded\s.{3}(\d+,\d{3},\d{3})|^.*awarded\s.{4}(\d+,\d{3},\d{3})
I'm not a RegEx expert, but try this:
.*(award)([^\d]*)([\d,]*)
Thanks. This path ended up being a dead end. But, it did highlight a few variables in my source data that I had not factored. He is the updated post / question.