I tried this once before. I took it down by "solving it", but it was not solved. It was a hot mess. I now better understand what I need.
I need to extract the dollar value of the string value that is immediately N characters from the word "awarded." The string value will be >$1,000,000. The text may include other values, but I only want to extract the first string value immediately N characters from "awarded."
I do not know how to adjust for the variability in the number of characters or in the value of the award nor how to convert it to a dollar value from a string.
I know it has to start. ^.*\bawarded\s. ...
Examples
Lorem ipsum dolor sit amet, consectetur adipiscing awarded an $8,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris $28,069,336 in aliquam sem fringilla.
OR something like this
Lorem ipsum dolor sit amet, consectetur adipiscing awarded to my best friend $9,998,069,336 elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Augue eget arcu dictum varius duis at consectetur lorem donec. Nunc scelerisque viverra mauris $28,069,336 in aliquam sem fringilla.
THANKS!
Solved! Go to Solution.
A quick follow up. This is the current expression. \bawarded\s.*?([0-9]{7,})
I use a formula tool to remove string value's $ sign and commas. The expression looks for a series of numbers >7 N characters after awarded.
BUT, I am stuck on one small detail. Some text may have a second occurrence of awarded. How do I stop the expression after it finds the first awarded and value pair?
AND while it works on regex101.com, it actually does not work at all in Alteryx. UGH.
Hello @hellyars
Are you open to a non-RegEx solution? Thought of this approach quickly, since you just want to pull numbers.
You can remove all text, punctuation, and extra spaces with the data cleansing. Then a few manipulations. A formula tool gives you only the first number in each row (and a filter tool makes sure the numbers are >=1000000
Cheers!
Esther
Please let me know if this works for you, because it's a single tool solution!
You can use a multi-field formula tool to get just the first set of numbers (and change the data type to a number(. For whatever reason, the Replace stops after the first number.
I suggest:
ToNumber(Replace(REGEX_Replace([Input], "^.*?\bawarded\b.*?\$([0-9,]{6,}).*?$|^.*()$", "$1"),",",""))
Breaking it down the core regex:
^.*?\bawarded\b.*?\$([0-9,]{6,}).*?$
Matches the awarded amount. Using non-greedy ways to match means it should extract whole number correctly into $1
This part:
|^.*()$
is a trick to mean that if the string does not match then $1 will be empty.
After this a traditional REPLACE is used to remove commas and then ToNumber to convert to a number
Sample attached.
Thank you. This works, but I need it to key off the word awarded. An awarded value will always exist. If I can get this to work, I can apply it to other keyword value pairs that are less frequently used.
Brilliant. That works in one fell swoop.
I tested it by adding multiple award + value pairs to the text. It always matches the first pair. It looks like ^.*? up front is the key to matching the first occurrence. It also worked when I added it to the expression I had been toiling with earlier. Why do you use the second \b in" \bawarded\b" ?
THANK YOU!!!
Hi @hellyars
This is a shorter solution:
ToNumber(Replace(REGEX_Replace([Field1], ".*(?<!awarded)awarded.*?\$([\d,]+).*", "$1"), ",", ""))
Using negative lookbehind, you assure that you hit your first awarded word.
Then you use non-greedy till the first "$", and group what you want to capture (digits or commas - full number).
Replace commas by nothing and convert the expression to a number.
Cheers,
@jdunkerley79 got to this before I did, and his solution is great. Glad we were able to solve this for you!!!
You can put the ToNumber and Replace functions in the same formula tool, after the Regex_Replace. Although you can string both together, I find it easier to separate them.
Cheers!
Esther
I used the second \b to allow for something like 'awarded: blah' to match but ' awardeds ' would not.
Depends a lot on the input exactly what regex would be best but glad solution worked for you.