Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF tables, multi-page, fields don't line up

PuffinPanic
9 - Comet
9 - Comet

Hi,

 

I am trying to process some purchase order files that are in pdf format with the computer vision tools. 

 

I'm having a problem where the files are multiple pages with the fields extracted not lining up, meaning the data isn't in the same fields. Meaning my output looks something like this extract:

PageColumn 1Column 2Column 3Column 4Column 5Column 6Column 7Column 8Column 9
1Line No.ProductCodeDescription of Goodsor Services  QtyUnit of
1        Measure
2Line No.Product CodeDescription of Goods or Services QtyUnit ofUnit Price Line
3Line No.Product CodeDescription of Goodsor Services  Qty
3No.ProductCodeDescription of Goodsor Services QtyUnit ofUnit Price
3       Measure 

I've removed the actual product data because it's sensitive data, but as you can see the headers from the different pages don't line up. 

 

Any suggestions on how I can (easily?) line up the headers (and associated data from each page) so that I can process the data accurately please?

 

Thanks

 

PuffinPanic

2 REPLIES 2
lwolfie
11 - Bolide

I can't help with the computer vision tools.  I use the PDF Macro in the gallery.  I typically parse the header rows separate from the data.  You only need one set of the header names, the rest can be filtered out.  That way I only need to set up one header row and then I can deal with the data separately.  This is usually a lot easier to handle.  

PuffinPanic
9 - Comet
9 - Comet

Thanks @lwolfie , I'll have a look at your solution

Labels