Hi,
I am trying to process some purchase order files that are in pdf format with the computer vision tools.
I'm having a problem where the files are multiple pages with the fields extracted not lining up, meaning the data isn't in the same fields. Meaning my output looks something like this extract:
Page | Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | Column 6 | Column 7 | Column 8 | Column 9 |
1 | Line No. | Product | Code | Description of Goods | or Services | Qty | Unit of | ||
1 | Measure | ||||||||
2 | Line No. | Product | Code | Description of Goods or Services | Qty | Unit of | Unit Price Line | ||
3 | Line No. | Product | Code | Description of Goods | or Services | Qty | |||
3 | No. | Product | Code | Description of Goods | or Services | Qty | Unit of | Unit Price | |
3 | Measure |
I've removed the actual product data because it's sensitive data, but as you can see the headers from the different pages don't line up.
Any suggestions on how I can (easily?) line up the headers (and associated data from each page) so that I can process the data accurately please?
Thanks
PuffinPanic
I can't help with the computer vision tools. I use the PDF Macro in the gallery. I typically parse the header rows separate from the data. You only need one set of the header names, the rest can be filtered out. That way I only need to set up one header row and then I can deal with the data separately. This is usually a lot easier to handle.
Thanks @lwolfie , I'll have a look at your solution