Alteryx Designer Desktop Discussions

Idyllic_Data_Geek · ‎07-07-2021

I have a requirement of scanning a PDF document for a required piece of information and then extract it to excel file. Any possible way of doing this in Alteryx without having to go through the route of Python? The PDF input does not work for me as my employer has not paid for the upgraded functions in Alteryx. Thanks in Advance!

dougperez · ‎07-07-2021

You will have to use the PDF input or python... I don't know any other method to do that, see the link below:
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/How-To-Input-PDF-to-convert-to-Excel/t....

JosephSerpis · ‎07-07-2021

You can use R instead of Python however that is still a coding approach.

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/PDF-Parsing-in-Alteryx-using-R/ta-p...

Idyllic_Data_Geek · ‎07-08-2021

@JosephSerpis can you please assist me this R solution?

JosephSerpis · ‎07-08-2021

What do you need help with?

Idyllic_Data_Geek · ‎07-08-2021

I have a scanned letter so I think it is an image in PDF format.....I need to read the 2 pieces of information from it which was always be in the same place. The Python and the R solution is giving me errors...

JosephSerpis · ‎07-08-2021

Both Python and R approaches are about tacking Text in a PDF document rather than an Image. The screenshot below show the details from the R package being used in the example I shared.

Idyllic_Data_Geek · ‎07-08-2021

So how can I extract the data out of an image. I can't even install the extra R packages on my machine that some one else had mentioned here

markcurry · ‎07-09-2021

Hi @Idyllic_Data_Geek

If your PDF files haven't been OCR'ed you can use this 'PDF Input (Text and Image)' tool created by @DiganP ,

https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa

This tool uses 2 additional R packages (pdftools and tesseract). If you are blocked from installing R packages to your C:\Program Files\Alteryx\R-.... folder, you could try running the two workflows attached that will install them to C:\Users\<username>\Documents\R\win-library\<version>

Hopefully that helps.

Idyllic_Data_Geek · ‎07-09-2021

@markcurry I get the below error as I have the 2020 version installed on my Company computer.

Alteryx Designer Desktop Discussions

Reading text from PDF file