Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF Import - Intelligence Suite

PeterAP
8 - Asteroid

I have just installed a trial of the intelligence suite to trial a case example of importing PDF files (e.g. Incident Tickets) using the Image Template, Image to Text and PDF Input Tools but am hitting a few problems.

 

The first few fields of each PDF pick up the data correctly, however, later fields are not picking up correctly.

 

I think this might be because the field may not be in exactly the same place on each page e.g. if there are multiple lines in a field above. Is this because the tool only picks up text from the exact position the text is in a document, rather than the text relative to other text (e.g. a header near it)?

 

Is there another way to do this?

 

WOrkflow.PNG

4 REPLIES 4
BrandonB
Alteryx
Alteryx

That is correct, if you look at the Markup string that is output by the template tool you will see that it is specifying coordinates to pull from the image. In some scenarios you can work with a slightly larger region to accommodate shifting, but if it is too different you may need to read in a larger portion of the document and leverage some parsing techniques. 

PeterAP
8 - Asteroid

Has anyone else found any using ways of parsing - PDF's with varying numbers of pages / size of fields?

 

I've found using the find a replace tool to find field headers and replacing with the field header plus a £ sign quite useful and then using text to columns to parse out the text I want.

 

One problem I have run across is that sometimes some text is not coming through at all through the Intelligence suite tools.

madisonhoff
5 - Atom

Has anyone found a solution for this? I am having the same issue.

mathieuf
Alteryx
Alteryx

Hi @madisonhoff@PeterAP,

 

Another way could be to import the full PDF (not using template) and then parse it and look for keywords. Similar to XML or HTML parsing.

Labels