Alteryx Designer Desktop Discussions

Dave · ‎12-21-2022

Hi all

Ive built a number of workflows using the various PDF tools available in Alteryx to rename files for upload to internal servers.

Most of these workflows apply to between 20-2000 docs. The read the pdf, find the relevant id and then rename the file with a predefined structure using the id.

Ive just been asked to apply one of these builds to a folder with 9000 documents. Its taking forever to parse the data as I would expect. I would like to cut this down as much as possible. So I have 2 questions

1- Is there a way to tell the PDF tool to just read the first line of data on each document (where the identifier is located) and move on to the next document? Ive tried the PDF input tool and the image reader tools and I cant see a way to do this, but I thought I would ask

2- If not can you recommend a tool that will do the heavy lifting in terms of data scraping the pdfs as quickly as possible.

Any help would be greatly appreciated

Dave

gautiergodard · ‎12-21-2022

Hey @Dave

To answer your questions:

1) Yes, you can specify a region of a pdf that you would like to read by using the "Image Template" tool within the Computer Vision tool pallet.

2) If you are processing system generated pdfs (not scanned copies of documents that are images) Alteryx recently release a new PDF to Text tool that greatly increases the accuracy and speed of extraction. Including the link to this new tool here for your reference: PDF to Text | Alteryx Help

Hope this helps!

Dave · ‎12-21-2022

Thats very interesting - Ill give the new tool a whirl, thank you

Alteryx Designer Desktop Discussions

Parse data from multiple PDF files

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data

Re: Row creation