Reading text from PDF file
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have a requirement of scanning a PDF document for a required piece of information and then extract it to excel file. Any possible way of doing this in Alteryx without having to go through the route of Python? The PDF input does not work for me as my employer has not paid for the upgraded functions in Alteryx. Thanks in Advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
You will have to use the PDF input or python... I don't know any other method to do that, see the link below:
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/How-To-Input-PDF-to-convert-to-Excel/t....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
You can use R instead of Python however that is still a coding approach.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@JosephSerpis can you please assist me this R solution?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
What do you need help with?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have a scanned letter so I think it is an image in PDF format.....I need to read the 2 pieces of information from it which was always be in the same place. The Python and the R solution is giving me errors...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Both Python and R approaches are about tacking Text in a PDF document rather than an Image. The screenshot below show the details from the R package being used in the example I shared.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
So how can I extract the data out of an image. I can't even install the extra R packages on my machine that some one else had mentioned here
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
If your PDF files haven't been OCR'ed you can use this 'PDF Input (Text and Image)' tool created by @DiganP ,
https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa
This tool uses 2 additional R packages (pdftools and tesseract). If you are blocked from installing R packages to your C:\Program Files\Alteryx\R-.... folder, you could try running the two workflows attached that will install them to C:\Users\<username>\Documents\R\win-library\<version>
Hopefully that helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
