Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extracting Specific Format number from String

zoeuno
7 - Meteor

Hi Everyone, I ran into a problem when cleaning up data from a PDF file.

 

Say if I have this string in a cell and I want the output to get the first part of the string and the big number with comma in the middle. How can I tell Alteryx to do that? Thanks!

 

Input:

"QUARTER WAGE RECAP: EMPLOYEES: TOTAL MALE FEM NEW SUI SUBJECT WAGES 1,999,999.70 MONTH 1 91 53 38 HIRE"

 

Output:

QUARTER WAGE RECAP:1,999,999.70
 
 
  •  
3 REPLIES 3
Thableaus
17 - Castor
17 - Castor

Hi @zoeuno 

 

Not so fancy solution, but this one worked with your numbers.

 

I tested in some scenarios, but you should look at more variations of your data.

 

REGEX_Replace([Field1], "^(.*?:).*?((\d+,)+)(\d+)(\.\d+)?.*", "$1$2$4$5")

 

 

Cheers,

zoeuno
7 - Meteor

Hi! Just tried and it worked for that line. I do have another variation and it's 

"EXCESS SUI WAGES 8,160,812.87 MONTH 2 96 56 40". In this case I want 

EXCESS SUI WAGES8,160,812.87

 

Also just curious, what does "$1$2$4$5" mean in the previous formula? Thanks again!

Thableaus
17 - Castor
17 - Castor

@zoeuno 

 

This would also work looking at your data:

 

REGEX_Replace([Field1], "(.*?:).*\s(.*)\sMONTH.*", "$1$2")

 

$1, $2, $3...they represent the captured groups in order (the patterns of REGEX that are between parenthesis).

 

Your second example is a bit harder to fit in a REGEX since it's a totally different pattern from the first row.

 

REGEX_Replace([Field1], "^[^A-Z]*(.*?)(?=[^\sA-Z]).*?(\d.*)\sMONTH.*", "$1 $2")

 

This is the closest I could get. Which is really complicated REGEX to explain.


To learn more about it, I recommend this website : https://www.rexegg.com/

 

Cheers,

 

Labels
Top Solution Authors