Hi Everyone, I ran into a problem when cleaning up data from a PDF file.
Say if I have this string in a cell and I want the output to get the first part of the string and the big number with comma in the middle. How can I tell Alteryx to do that? Thanks!
Input:
"QUARTER WAGE RECAP: EMPLOYEES: TOTAL MALE FEM NEW SUI SUBJECT WAGES 1,999,999.70 MONTH 1 91 53 38 HIRE"
Output:
QUARTER WAGE RECAP: | 1,999,999.70 |
Solved! Go to Solution.
Hi @zoeuno
Not so fancy solution, but this one worked with your numbers.
I tested in some scenarios, but you should look at more variations of your data.
REGEX_Replace([Field1], "^(.*?:).*?((\d+,)+)(\d+)(\.\d+)?.*", "$1$2$4$5")
Cheers,
Hi! Just tried and it worked for that line. I do have another variation and it's
"EXCESS SUI WAGES 8,160,812.87 MONTH 2 96 56 40". In this case I want
EXCESS SUI WAGES | 8,160,812.87 |
Also just curious, what does "$1$2$4$5" mean in the previous formula? Thanks again!
This would also work looking at your data:
REGEX_Replace([Field1], "(.*?:).*\s(.*)\sMONTH.*", "$1$2")
$1, $2, $3...they represent the captured groups in order (the patterns of REGEX that are between parenthesis).
Your second example is a bit harder to fit in a REGEX since it's a totally different pattern from the first row.
REGEX_Replace([Field1], "^[^A-Z]*(.*?)(?=[^\sA-Z]).*?(\d.*)\sMONTH.*", "$1 $2")
This is the closest I could get. Which is really complicated REGEX to explain.
To learn more about it, I recommend this website : https://www.rexegg.com/
Cheers,