Hello All
I am trying to pull out specific details from a set of PDF files. I am able to do that using one PDF file and setting template for that file. But is it possible to pull out specific fields from set of PDF files of different Structure? Is it possible to set a standard template? or any tool to search for those particular fields and load it in image to text tool? Any help is highly appreciated. Thank you
@PreethiS you could have a play with Annotations and save these for your PDF files. The annotations will be applied across all of the documents that you read in. Here is some more information for you:
https://help.alteryx.com/current/designer/image-template
I would recommend you check out our new version of the PDF processing tools also in our Computer Vision suite, these have improved and will help you in your scenario I'd say:
https://help.alteryx.com/designer/computer-vision
Thank you Rishik. I have used annotation for a single PDF file. but i am trying to pull out specific financial statements from a detailed documentation for 15 PDFs. Is there any way to automatically detect the financial fields and pull out data from the detailed PDF file?
@PreethiS the annotation will work on a batch number of PDFs, so for example the 15 PDF files. If the data in the files match the annotation then this should work.
Have you given the Computer Vision tools a go? - They can now detect tables automatically and extract the tables into the workflow.
Thank you Rishi. But I am able to annotate for only one PDF file, also if it has 2 pages, my data goes to next row for the second page. how do i annotate for all 15 PDFs???
@PreethiS you should be able to annotate the data across the pages and files. I think if you book a Virtual Solutions Center booking with us then we can explore your problem more specifically via a Teams call:
https://community.alteryx.com/t5/Virtual-Solution-Center/tkb-p/vsc
Thank you
Hi, I'm quite late to the party but have any solutions been found? I.e. how to make use of multiple annotation files for differently formatted pdf files.
