In case you missed the announcement: Alteryx One is here, and so is the Spring Release! Learn more about these new and exciting releases here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extracting values from a specific PDF page

NeethaMalik
5 - Atom

Hi all,

 

I am new to Alteryx and I am trying to read pdf/image files. The data in these files is scattered. I have Alteryx intelligence suit and I have converted the data to text using it. The files have 14 + pages but I am specifically interested in just one page and the data in the page. does anyone have any tips to help me 

6 REPLIES 6
Raj
16 - Nebula

use image template tool

NeethaMalik
5 - Atom

I tried Image Template tool, it pulls data for one PDF file, but the moment I run the workflow for multiple files it returns gibberish data or adjacent data elements from the highlighted ones for other PDF files.

alexnajm
18 - Pollux
18 - Pollux

You can use the Image Input tool to read in the list of pages from that PDF, then use a Filter to limit to just the page you need. Then using the Image Template tool should work well!

NeethaMalik
5 - Atom

Thank you Alex,

 

It helped me narrow my search to just one page as opposed to all pages, this is great!! Now the problem I am trying to deal with is the data output is not necessarily from the fields I highlighted in the Image template. Its working file for one row but not all the rows. 

 

NeethaMalik_1-1678971797916.png

 

BS_THE_ANALYST
14 - Magnetar
14 - Magnetar

@NeethaMalik The approach I take is with the PDF to Text tool:

BS_THE_ANALYST_0-1678972278440.png

Then you can use some filtering logic like page = blah, and columns contain blah. Certainly alot more involved in terms of parsing. But it'll bring it every piece of data without missing things. 

 

All the best,
BS

LinkedIN

Bulien
NeethaMalik
5 - Atom

Thank you, this indeed worked.

Labels
Top Solution Authors