I have a requirement of scanning a PDF document for a required piece of information and then extract it to excel file. Any possible way of doing this in Alteryx without having to go through the route of Python? The PDF input does not work for me as my employer has not paid for the upgraded functions in Alteryx. Thanks in Advance!
You will have to use the PDF input or python... I don't know any other method to do that, see the link below:
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/How-To-Input-PDF-to-convert-to-Excel/t....
You can use R instead of Python however that is still a coding approach.
@JosephSerpis can you please assist me this R solution?
What do you need help with?
I have a scanned letter so I think it is an image in PDF format.....I need to read the 2 pieces of information from it which was always be in the same place. The Python and the R solution is giving me errors...
Both Python and R approaches are about tacking Text in a PDF document rather than an Image. The screenshot below show the details from the R package being used in the example I shared.
So how can I extract the data out of an image. I can't even install the extra R packages on my machine that some one else had mentioned here
If your PDF files haven't been OCR'ed you can use this 'PDF Input (Text and Image)' tool created by @DiganP ,
https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa
This tool uses 2 additional R packages (pdftools and tesseract). If you are blocked from installing R packages to your C:\Program Files\Alteryx\R-.... folder, you could try running the two workflows attached that will install them to C:\Users\<username>\Documents\R\win-library\<version>
Hopefully that helps.