Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extracting Specific Format number from String

zoeuno
7 - Meteor

Hi Everyone, I ran into a problem when cleaning up data from a PDF file.

 

Say if I have this string in a cell and I want the output to get the first part of the string and the big number with comma in the middle. How can I tell Alteryx to do that? Thanks!

 

Input:

"QUARTER WAGE RECAP: EMPLOYEES: TOTAL MALE FEM NEW SUI SUBJECT WAGES 1,999,999.70 MONTH 1 91 53 38 HIRE"

 

Output:

QUARTER WAGE RECAP:1,999,999.70
 
 
  •  
3 REPLIES 3
Thableaus
17 - Castor
17 - Castor

Hi @zoeuno 

 

Not so fancy solution, but this one worked with your numbers.

 

I tested in some scenarios, but you should look at more variations of your data.

 

REGEX_Replace([Field1], "^(.*?:).*?((\d+,)+)(\d+)(\.\d+)?.*", "$1$2$4$5")

 

 

Cheers,

zoeuno
7 - Meteor

Hi! Just tried and it worked for that line. I do have another variation and it's 

"EXCESS SUI WAGES 8,160,812.87 MONTH 2 96 56 40". In this case I want 

EXCESS SUI WAGES8,160,812.87

 

Also just curious, what does "$1$2$4$5" mean in the previous formula? Thanks again!

Thableaus
17 - Castor
17 - Castor

@zoeuno 

 

This would also work looking at your data:

 

REGEX_Replace([Field1], "(.*?:).*\s(.*)\sMONTH.*", "$1$2")

 

$1, $2, $3...they represent the captured groups in order (the patterns of REGEX that are between parenthesis).

 

Your second example is a bit harder to fit in a REGEX since it's a totally different pattern from the first row.

 

REGEX_Replace([Field1], "^[^A-Z]*(.*?)(?=[^\sA-Z]).*?(\d.*)\sMONTH.*", "$1 $2")

 

This is the closest I could get. Which is really complicated REGEX to explain.


To learn more about it, I recommend this website : https://www.rexegg.com/

 

Cheers,

 

Labels