We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Reading Information from a pdf File

THECUSE4463
6 - Meteoroid

When reading a folder containing multiple pdf (invoice) files, and wanting to use specific information that will definitely appear on EACH invoice, if the following appears on the invoice, but not necessarily in the EXACT same location (i.e. same row) horizontally:

 

Total Charges                        $25,000.00

 

Is there a way for Alteryx to be "programmed" to "find" that specific "string", regardless of where it appears on the file/invoice?

3 REPLIES 3
Bluebird_Tim
7 - Meteor

Hey there! Great question re; PDFs.  First question, do you already have the intelligence suite add-on? How are you reading PDFs now?

THECUSE4463
6 - Meteoroid

Bluebird_Tim,

Thanks for getting in touch. To answer your question, yes, I do have the Intelligence Suite installed, and I do use it to read the pdf files. I've attached a screenshot extract of a section of the pdf (invoice) that I have set up as the "Image Template" in my Alteryx workflow.

Any help, assistance, or advice you can provide would be greatly appreciated.

Bluebird_Tim
7 - Meteor

Got it - So I have a couple thoughts.

 

Depending on how the PDF has come in, you could try to read the whole thing and then parse it out.

 

The better technique I think would be to grab a larger part of the PDF and then parse out the Total Cost area.  This depends on exactly how things move from PDF to PDF, but essentially determine a "safe" area of capture where you know total cost will show up.  I attached a screenshot of what I would try.  There isn't a perfect way to track values if they slightly move across PDFs.  This is sort of the best way to accommodate that when the information generally stays in the same area of the PDF.

 

Additionally, if you haven't already, this tool is helpful if there are any issues returning correct characters back: https://help.alteryx.com/20231/designer/image-processing. It will make the image easier to read by the OCR

You might have to experiment, this where the "art" of things come in.  

 

Bluebird_Tim_0-1685540323779.png

Hope this helps!

Labels
Top Solution Authors