Hi all,
I am new to Alteryx and I am trying to read pdf/image files. The data in these files is scattered. I have Alteryx intelligence suit and I have converted the data to text using it. The files have 14 + pages but I am specifically interested in just one page and the data in the page. does anyone have any tips to help me
Solved! Go to Solution.
use image template tool
I tried Image Template tool, it pulls data for one PDF file, but the moment I run the workflow for multiple files it returns gibberish data or adjacent data elements from the highlighted ones for other PDF files.
You can use the Image Input tool to read in the list of pages from that PDF, then use a Filter to limit to just the page you need. Then using the Image Template tool should work well!
Thank you Alex,
It helped me narrow my search to just one page as opposed to all pages, this is great!! Now the problem I am trying to deal with is the data output is not necessarily from the fields I highlighted in the Image template. Its working file for one row but not all the rows.
@NeethaMalik The approach I take is with the PDF to Text tool:
Then you can use some filtering logic like page = blah, and columns contain blah. Certainly alot more involved in terms of parsing. But it'll bring it every piece of data without missing things.
Thank you, this indeed worked.
User | Count |
---|---|
59 | |
26 | |
25 | |
22 | |
21 |