Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Get data from Scanned PDF Files

Abaicu_23
5 - Atom

Hello!

 

I am looking for solutions regarding the following logic:

 

1. A PDF file containing multiple scanned documents needs to be split up in different document based on a specific separator page 

2. For each document the program will then need to extract certain information from it (there are different document types)  and add them in different Excel files depending on the type.

 

I have tried installing the PDF to Text but is giving me an error. The Parse PDF  is also not working with these type of files.

 

Any suggestion or help regarding this is highly appreciated!

3 REPLIES 3
BrandonB
Alteryx
Alteryx

Have you tried the Alteryx Intelligence Suite? There is a native PDF tool that may work better for you.

 

https://www.alteryx.com/products/alteryx-platform/intelligence-suite 

Abaicu_23
5 - Atom

Hello Brandon,

 

I have tried the Alteryx Intelligence Suite, however I was only able to split the main pdf file. Do you have any suggestions on how to split the documents and also extract parts of each split having a template (there are different types of scanned documents in the main pdf)?

pedrodrfaria
13 - Pulsar

Hi @Abaicu_23 

 

If you know how to play around with Regex and you cannot follow what @BrandonB had suggested with using the OCR Template capability within IS, I recommend using the PDF Reader Macro from the Public Gallery. This way you should be able to scan it all and put it in a text format, then you can using Regex to parse what you need.

 

Link:

https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa 

Labels