Alteryx Designer Desktop Discussions

zoeuno · ‎05-23-2019

Hi Everyone, I ran into a problem when cleaning up data from a PDF file.

Say if I have this string in a cell and I want the output to get the first part of the string and the big number with comma in the middle. How can I tell Alteryx to do that? Thanks!

Input:

"QUARTER WAGE RECAP: EMPLOYEES: TOTAL MALE FEM NEW SUI SUBJECT WAGES 1,999,999.70 MONTH 1 91 53 38 HIRE"

Output:

QUARTER WAGE RECAP:

1,999,999.70

Thableaus · ‎05-23-2019

Hi @zoeuno

Not so fancy solution, but this one worked with your numbers.

I tested in some scenarios, but you should look at more variations of your data.

REGEX_Replace([Field1], "^(.*?:).*?((\d+,)+)(\d+)(\.\d+)?.*", "$1$2$4$5")

Cheers,

zoeuno · ‎05-23-2019

Hi! Just tried and it worked for that line. I do have another variation and it's

"EXCESS SUI WAGES 8,160,812.87 MONTH 2 96 56 40". In this case I want

EXCESS SUI WAGES

8,160,812.87

Also just curious, what does "$1$2$4$5" mean in the previous formula? Thanks again!

Thableaus · ‎05-23-2019

@zoeuno

This would also work looking at your data:

REGEX_Replace([Field1], "(.*?:).*\s(.*)\sMONTH.*", "$1$2")

$1, $2, $3...they represent the captured groups in order (the patterns of REGEX that are between parenthesis).

Your second example is a bit harder to fit in a REGEX since it's a totally different pattern from the first row.

REGEX_Replace([Field1], "^[^A-Z]*(.*?)(?=[^\sA-Z]).*?(\d.*)\sMONTH.*", "$1 $2")

This is the closest I could get. Which is really complicated REGEX to explain.

To learn more about it, I recommend this website : https://www.rexegg.com/

Cheers,

Alteryx Designer Desktop Discussions

Extracting Specific Format number from String

Re: Unable to get an output

Example workflow for setting up a custom list to u...

Re: Firm names parse

Re: Help with Multi-Row formula

Re: Assign Random data to Executive with limited p...