Hello!
I am looking for solutions regarding the following logic:
1. A PDF file containing multiple scanned documents needs to be split up in different document based on a specific separator page
2. For each document the program will then need to extract certain information from it (there are different document types) and add them in different Excel files depending on the type.
I have tried installing the PDF to Text but is giving me an error. The Parse PDF is also not working with these type of files.
Any suggestion or help regarding this is highly appreciated!
Have you tried the Alteryx Intelligence Suite? There is a native PDF tool that may work better for you.
https://www.alteryx.com/products/alteryx-platform/intelligence-suite
Hello Brandon,
I have tried the Alteryx Intelligence Suite, however I was only able to split the main pdf file. Do you have any suggestions on how to split the documents and also extract parts of each split having a template (there are different types of scanned documents in the main pdf)?
Hi @Abaicu_23
If you know how to play around with Regex and you cannot follow what @BrandonB had suggested with using the OCR Template capability within IS, I recommend using the PDF Reader Macro from the Public Gallery. This way you should be able to scan it all and put it in a text format, then you can using Regex to parse what you need.
Link:
https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa