Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extracting values from a specific PDF page

NeethaMalik
5 - Atom

Hi all,

 

I am new to Alteryx and I am trying to read pdf/image files. The data in these files is scattered. I have Alteryx intelligence suit and I have converted the data to text using it. The files have 14 + pages but I am specifically interested in just one page and the data in the page. does anyone have any tips to help me 

6 REPLIES 6
Raj
16 - Nebula

use image template tool

NeethaMalik
5 - Atom

I tried Image Template tool, it pulls data for one PDF file, but the moment I run the workflow for multiple files it returns gibberish data or adjacent data elements from the highlighted ones for other PDF files.

alexnajm
18 - Pollux
18 - Pollux

You can use the Image Input tool to read in the list of pages from that PDF, then use a Filter to limit to just the page you need. Then using the Image Template tool should work well!

NeethaMalik
5 - Atom

Thank you Alex,

 

It helped me narrow my search to just one page as opposed to all pages, this is great!! Now the problem I am trying to deal with is the data output is not necessarily from the fields I highlighted in the Image template. Its working file for one row but not all the rows. 

 

NeethaMalik_1-1678971797916.png

 

BS_THE_ANALYST
14 - Magnetar
14 - Magnetar

@NeethaMalik The approach I take is with the PDF to Text tool:

BS_THE_ANALYST_0-1678972278440.png

Then you can use some filtering logic like page = blah, and columns contain blah. Certainly alot more involved in terms of parsing. But it'll bring it every piece of data without missing things. 

 

All the best,
BS

LinkedIN

Bulien
NeethaMalik
5 - Atom

Thank you, this indeed worked.

Labels
Top Solution Authors