Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF tables, multi-page, fields don't line up

PuffinPanic
9 - Comet
9 - Comet

Hi,

 

I am trying to process some purchase order files that are in pdf format with the computer vision tools. 

 

I'm having a problem where the files are multiple pages with the fields extracted not lining up, meaning the data isn't in the same fields. Meaning my output looks something like this extract:

PageColumn 1Column 2Column 3Column 4Column 5Column 6Column 7Column 8Column 9
1Line No.ProductCodeDescription of Goodsor Services  QtyUnit of
1        Measure
2Line No.Product CodeDescription of Goods or Services QtyUnit ofUnit Price Line
3Line No.Product CodeDescription of Goodsor Services  Qty
3No.ProductCodeDescription of Goodsor Services QtyUnit ofUnit Price
3       Measure 

I've removed the actual product data because it's sensitive data, but as you can see the headers from the different pages don't line up. 

 

Any suggestions on how I can (easily?) line up the headers (and associated data from each page) so that I can process the data accurately please?

 

Thanks

 

PuffinPanic

2 REPLIES 2
lwolfie
11 - Bolide

I can't help with the computer vision tools.  I use the PDF Macro in the gallery.  I typically parse the header rows separate from the data.  You only need one set of the header names, the rest can be filtered out.  That way I only need to set up one header row and then I can deal with the data separately.  This is usually a lot easier to handle.  

PuffinPanic
9 - Comet
9 - Comet

Thanks @lwolfie , I'll have a look at your solution

Polls
We’re dying to get your help in determining what the new profile picture frame should be this Halloween. Cast your vote and help us haunt the Community with the best spooky character.
Don’t ghost us—pick your favorite now!
Labels