I have a column with the values as:
Value |
30/11/20 240L Permanent Security Bin for the month of December Bin Rent 1 8.00 8.00 |
30/11/20 Foot pedal bin- on site use for the month of December -_| Bin Rent 2| 20.00 40.00 |
26/11/20 | JOB-2780456-YOM8& 120L Clinical Waste Bin Service 1 51.55 51.55 |
24/11/20 | JOB-2783621-T1S7 240L Clinical Waste Bin: Collect 8 65.00 520.00 |
10/11/20 | JOB-2783620-P2M3 2A40L Clinical Waste Bin Service 1 65.00 65.00 |
I want them to be parsed as:
Date | Job Number | Description | Action | Qty | Rate | Total |
30/11/2020 | 240L Permanent Security Bin for the month of December | Bin Rent | 1 | 8 | 8 | |
30/11/2020 | Foot pedal bin- on site use for the month of December | Bin Rent | 2 | 20 | 40 | |
26/11/2020 | JOB-2780456-YOM8 | 120L Clinical Waste Bin | Service | 1 | 51.55 | 51.55 |
24/11/2020 | JOB-2783621-T1S7 | 240L Clinical Waste Bin | Collect | 8 | 65 | 520 |
10/11/2020 | JOB-2783620-P2M3 | 2A40L Clinical Waste Bin | Service | 1 | 65 | 65 |
I am trying the RegEx tool with Parsing date then filter and a lot of other things but I am unable to get it right.
My current RegEx parsing is:
(\d+/\d+/\d+) \| ([A-Z0-9\-]+) (.*) (Service|OneOff) ([0-9|\.]+) ([0-9|\.]+) (.*)
Can anybody help?
Solved! Go to Solution.
I am going to take a crack at this... however, is this data consistent with your data's reality? What I mean is that the first line is missing the dividing pipes that the rest have.
I just want to make sure that this is intentional on your part. 😁
Stay tuned.
@BretCarr
Of course!
The data is from a PDF parser. I receive the PDF files and I run through the pdf parser which you can see in the input data (Value) co.
These patterns are all the variations of data that I receive. If it helps, I can put in more variations.
Thank you for asking the question.
PDF parsing? Gross! Haha
In order to get you something useful, I'll need all the gross variations you run into. I'm almost finished with your initial set.
I am with you!
PDF parsing is a task on a daily basis hence moving the process on Alteryx.
The next step will be to figure out how I can parse multiple PDFs at the same time (all files in the folder) as the data is always arranged in a matrix.
Please find a few more variations I could find.
Value |
30/11/20 240L Permanent Security Bin for the month of December Bin Rent 1 8.00 8.00 |
30/11/20 Foot pedal bin- on site use for the month of December -_| Bin Rent 2| 20.00 40.00 |
26/11/20 | JOB-2780456-YOM8& 120L Clinical Waste Bin Service 1 51.55 51.55 |
24/11/20 | JOB-2783621-T1S7 240L Clinical Waste Bin: Collect 8 65.00 520.00 |
10/11/20 | JOB-2783620-P2M3 2A40L Clinical Waste Bin Service 1 65.00 65.00 |
19/01/21 | JOB-2839597-G1C4 Sharps Container 1.8L Sharpak Service 4 23.48 93.92 |
26/01/21 | JOB-2855290-NOW9 Futile Service Fee OneOff 1 50.00 50.00 |
29/01/21 | JOB-3021769-R6V2 120L COVID19 Waste Bin FOS 1 91.80 91.80 |
Parsed as:
Date | Job Number | Description | Action | Qty | Rate | Total |
30/11/2020 | 240L Permanent Security Bin for the month of December | Bin Rent | 1 | 8 | 8 | |
30/11/2020 | Foot pedal bin- on site use for the month of December | Bin Rent | 2 | 20 | 40 | |
26/11/2020 | JOB-2780456-YOM8 | 120L Clinical Waste Bin | Service | 1 | 51.55 | 51.55 |
24/11/2020 | JOB-2783621-T1S7 | 240L Clinical Waste Bin | Collect | 8 | 65 | 520 |
10/11/2020 | JOB-2783620-P2M3 | 2A40L Clinical Waste Bin | Service | 1 | 65 | 65 |
19/01/2021 | JOB-2839597-G1C4 | Sharps Container 1.8L Sharpak | Service | 4 | 23.48 | 93.92 |
26/01/2021 | JOB-2855290-NOW9 | Futile Service Fee | OneOff | 1 | 50 | 50 |
29/01/2021 | JOB-3021769-R6V2 | 120L COVID19 Waste Bin | FOS | 1 | 91.8 | 91.8 |
I am with @Qiu regarding using a cleansing tool if you are scraping a PDF (*gag*). 🤣
If you can get rid of the symbols that should never exist (basically everything but the period), then you can always add the Job hyphen back in after the fact.
Are there any other formatting you need coming out? I was able to do the initial formatting of the expression as well though @Qiu is wonderful at uploading a nice workflow to go along with it which is probably more helpful! I'll see what I can do making my own assumptions.
Okay, so here is my first post attaching the actual workflow.
Main Parse:
(\d+\/\d+\/\d+) (JOB[0-9A-Z]*)?\s?(.*)\s(Bin Rent|Collect|FOS|OneOff|Service)\s(\d+)\s(\d+\.\d+)\s(\d+\.\d+)
Job Fix RegEx formula:
REGEX_Replace([Job Number], "(JOB)(\d{7})([A-Z0-9]*)", '$1' + '-' + '$2' + '-' + '$3')
Hope this helps!