Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Help with Regex

HW1
9 - Comet

I have a column with the values as:

 

Value
30/11/20 240L Permanent Security Bin for the month of December Bin Rent 1 8.00 8.00
30/11/20 Foot pedal bin- on site use for the month of December -_| Bin Rent 2| 20.00 40.00
26/11/20 | JOB-2780456-YOM8& 120L Clinical Waste Bin Service 1 51.55 51.55
24/11/20 | JOB-2783621-T1S7 240L Clinical Waste Bin: Collect 8 65.00 520.00
10/11/20 | JOB-2783620-P2M3 2A40L Clinical Waste Bin Service 1 65.00 65.00

 

I want them to be parsed as:

 

DateJob NumberDescriptionActionQtyRateTotal
30/11/2020 240L Permanent Security Bin for the month of DecemberBin Rent188
30/11/2020 Foot pedal bin- on site use for the month of DecemberBin Rent22040
26/11/2020JOB-2780456-YOM8120L Clinical Waste BinService151.5551.55
24/11/2020JOB-2783621-T1S7240L Clinical Waste BinCollect865520
10/11/2020JOB-2783620-P2M32A40L Clinical Waste BinService16565

 

I am trying the RegEx tool with Parsing date then filter and a lot of other things but I am unable to get it right.

My current RegEx parsing is: 

 

(\d+/\d+/\d+) \| ([A-Z0-9\-]+) (.*) (Service|OneOff) ([0-9|\.]+) ([0-9|\.]+) (.*)

 

Can anybody help?

7 REPLIES 7
BretCarr
10 - Fireball

I am going to take a crack at this... however, is this data consistent with your data's reality? What I mean is that the first line is missing the dividing pipes that the rest have.

 

I just want to make sure that this is intentional on your part. 😁

 

Stay tuned.

HW1
9 - Comet

@BretCarr 
Of course!
The data is from a PDF parser. I receive the PDF files and I run through the pdf parser which you can see in the input data (Value) co.

These patterns are all the variations of data that I receive. If it helps, I can put in more variations.
Thank you for asking the question.

BretCarr
10 - Fireball

PDF parsing? Gross! Haha

 

In order to get you something useful, I'll need all the gross variations you run into. I'm almost finished with your initial set.

HW1
9 - Comet

I am with you!
PDF parsing is a task on a daily basis hence moving the process on Alteryx.
The next step will be to figure out how I can parse multiple PDFs at the same time (all files in the folder) as the data is always arranged in a matrix.

 

Please find a few more variations I could find.

Value
30/11/20 240L Permanent Security Bin for the month of December Bin Rent 1 8.00 8.00
30/11/20 Foot pedal bin- on site use for the month of December -_| Bin Rent 2| 20.00 40.00
26/11/20 | JOB-2780456-YOM8& 120L Clinical Waste Bin Service 1 51.55 51.55
24/11/20 | JOB-2783621-T1S7 240L Clinical Waste Bin: Collect 8 65.00 520.00
10/11/20 | JOB-2783620-P2M3 2A40L Clinical Waste Bin Service 1 65.00 65.00
19/01/21 | JOB-2839597-G1C4 Sharps Container 1.8L Sharpak Service 4 23.48 93.92
26/01/21 | JOB-2855290-NOW9 Futile Service Fee OneOff 1 50.00 50.00
29/01/21 | JOB-3021769-R6V2 120L COVID19 Waste Bin FOS 1 91.80 91.80

 

Parsed as:

DateJob NumberDescriptionActionQtyRateTotal
30/11/2020 240L Permanent Security Bin for the month of DecemberBin Rent188
30/11/2020 Foot pedal bin- on site use for the month of DecemberBin Rent22040
26/11/2020JOB-2780456-YOM8120L Clinical Waste BinService151.5551.55
24/11/2020JOB-2783621-T1S7240L Clinical Waste BinCollect865520
10/11/2020JOB-2783620-P2M32A40L Clinical Waste BinService16565
19/01/2021JOB-2839597-G1C4Sharps Container 1.8L SharpakService423.4893.92
26/01/2021JOB-2855290-NOW9Futile Service FeeOneOff15050
29/01/2021JOB-3021769-R6V2120L COVID19 Waste BinFOS191.891.8
Qiu
20 - Arcturus
20 - Arcturus

@HW1 

Answered your another one though.

0209-HW1-2.PNG

BretCarr
10 - Fireball

I am with @Qiu  regarding using a cleansing tool if you are scraping a PDF (*gag*). 🤣

 

If you can get rid of the symbols that should never exist (basically everything but the period), then you can always add the Job hyphen back in after the fact.

 

Are there any other formatting you need coming out? I was able to do the initial formatting of the expression as well though @Qiu  is wonderful at uploading a nice workflow to go along with it which is probably more helpful! I'll see what I can do making my own assumptions.

BretCarr
10 - Fireball

Okay, so here is my first post attaching the actual workflow.

Oh yeah!Oh yeah!

Main Parse:

 

(\d+\/\d+\/\d+) (JOB[0-9A-Z]*)?\s?(.*)\s(Bin Rent|Collect|FOS|OneOff|Service)\s(\d+)\s(\d+\.\d+)\s(\d+\.\d+)

 

 

Job Fix RegEx formula:

 

REGEX_Replace([Job Number], "(JOB)(\d{7})([A-Z0-9]*)", '$1' + '-' + '$2' + '-' + '$3')

 

Hope this helps!

Labels