I have an output from a pdf parser tool that generates output from a matrix in a pdf to separate rows.
Please find attached.
I am unable to correctly parse the Site, Address and the Description column as per the expected output.
I want the output in the way the expected output is.
Will regex work to extract the data in the correct format? If so, how?
Thanks.
Hello @HW1
So looking at your workflow there are a few things going on here.
1. When you filter out the data, you are left with rows (like Rows 1,2, and 22) that appear to be new headers.
2. The data within is not separated with the same characters, for example, Line 4 looks like "31/12/20 120L Clinical Waste Bin for the month of January Bin Rent 2 4.33 8.66"
But, Line 3 looks like "15/12/20 | JOB-2776383-N61T7 120L Clinical Waste Bin Service 1 34.45 34.45"
This adds another level of data prep where you need the same delimiters to separate your data.
I would recommend addressing #1 first though, as each of these kinds of rows appear to be a new dataset, If so, what is the importance?
User | Count |
---|---|
106 | |
82 | |
70 | |
54 | |
40 |