The PDF Input tool is designed to allow users to input the content of PDFs into Alteryx for further analysis.
Documentation for the macro can be found here:
An example workflow using this macro can be found here:
This Macro, and especially the R code within it, is based on this blogpost by Oliver Power:
Click here to download the tool:
This is an amazing tool and has been really useful for me in the last past years. Although i am trying to find out a solution for my problem and it's being hard to be found in forum. i'd like to know if is it possible to use this tool in workflows that are uploaded in the alteryx server. Is it possible? I've noticed that occurs an error when i try to run workflows with this tool contained in it.
Is it possible to use this tool in Alteryx Server?
Thank you so much and a really happy 2022 to everyone! Take care.
Hi @mlemesx - this article from The Information Lab has instructions on installing R packages on both Designer and Server.
Good afternoon, Blake.
I appreciate so much for the provide the link to install the R packages. Thank you so much.
As mentioned in the instructions, R.exe needs to be run prior to using the tool. Upon following the instructions, I haven't seen any file named "R.exe" in the Alteryx folder I have. When I tried to run the tool, I got this error message below:
Just searched and came across to the below suggestion in the forums:
"It looks like the PDF tool you're trying to use is R-based. Please make sure you have the predictive tools installed".
"Predictive tools" are no longer available in installer section. Instead, I see "Alteryx Intelligence Suite". Would downloading it work for running this tool?
Hey @OErdinc, when you get to the designer download page (downloads.alteryx.com), you can see the predictive tools in the same list as the normal designer downloads.
Anyone know why Alteryx 2021.3 (w/ R 4.0+) does not want to install pdftools, but Alteryx 2021.1 (w/ 3.6) it works?
This solution in this post tells you how to update to the latest version of the pdftools to work with 2021.3.
It's actually the Rcpp package that needs to be updated through the r.exe on your install.
I followed the directions and it now works for me on 2021.3.
@PhilipL which link do you see reference to Rcpp package? I don't see that through the links referenced.
Sorry, I didn't post the link.
Here it is:
Awesome thanks! That link definitely helps. Now time to play the run-around game with my IT Department, as Cylance flags "internet.dll" in R 4.0+ and quarantines it. Thus, I cannot actually update any packages inside R 4.0+.
Is there a way to use Designer 2021.3 AND R 3.6.3?
Unable to install packages after R update: unable to access index for repository: internet routines ...
I don't know if trying to use R3.63 with 2021.3+ is possible and what the possible adverse effects might be.
My instinct says stick with R 4.0.
Might be worth open an Alteryx Support ticket to get more input.
Can I use that tool at all on a pc where I don't have admin-rights? In the documentation it says I'm to install pdftools via R.exe running it with elevated privileges - which I cannot do on this machine - so no PDF-input for me?
It may be that you just need admin rights long enough to install the R tools.
Check with your IT to see if an Admin can help you with that install step.
I've installed the "pdftools" R package and the macro, but when I use the pdf input tool, I get the following error :
PDF Input (1) BatchPDFInput (26): Record #1: Tool #2: Error: .onLoad failed in loadNamespace() for 'pdftools', details:
PDF Input (1) BatchPDFInput (26): Record #1: Tool #2: Execution halted
PDF Input (1) BatchPDFInput (26): Record #1: Tool #2: The R.exe exit code (1) indicated an error.
PDF Input (1) BatchPDFInput (26): The output connection "Output7" was not valid
Anyone can help me out???
same here, no luck, seems new 2021.4 can't run pdf input, same error
Hopefully you've sorted your issue by now, but if not you'll need to update the version of RCPP. It's all documented here:
I just returned from the Alteryx Inspire conference and became interested in this topic. I followed the steps but I get an error - see the attached screenshot. Appreciate any help. Current version of Alteryx Designer is 2022.1.1.25127
Hey everyone! I've just downloaded the pdf input macro, but it seems to be only working for one-page pdfs. When I try to convert a pdf with two or more pages, it doesn't output anything. Does anyone know if it's possible to solve this?Thank you in advance!
Hi @lucas_miranda I just tested the macro on a 3 page pdf and it works as expected.
if you want to track the page number as well, then you can edit the inner macro BatchPDFInput.yxmc to include a recordID tool before the text to columns tool:
Hi @OllieClarke Me, as @lucas_miranda , also having this issue such that 2 page pdf is not being read. Could you try again potentially with a pdf that has more data in it? I'm using the PDF Input tool. Thank you!
@bogdansheremeta @lucas_miranda could you possibly share a problem pdf as I'm not able to replicate your issue...
For the free PDF input tool below, where exactly is the field where we select which PDF I want to bring it to analyze? I see it on the instructions but not on the icon in the workflow when I click on it. Please see the screenshot to see what I see. What am I missing?
@dandreas You need the actual PDF Tool. Looks like this
@OllieClarke thank you for the reply! I tested it again for the simplest two-page pdf file I could make and surprisingly it worked. But it's still not reading some pdf files that I need to convert. Unfortunately I'm not allowed to share client information, but the pattern I've observed is that it usually doesn't work for "old-looking" pdfs, those which you can't copy (ctrl + c) information from them. I'm sorry for my limited explanation haha.
Thanks in advance!
@lucas_miranda Ah, the issue here is that my tool uses an R library to bring in the text from the pdf, there is no OCR going on. So if you feed it a pdf which doesn't have any text data built into it - i.e. it's basically an image, then my tool won't work. You need to be able to select text in the pdf.
@OllieClarke Oh, I see, thank you for this explanation! Your tool has been very useful regardless. Cheers!
Hi! Is this also compatible in version 2022.1? How can we install the R packages in an offline server?
anything available using python? My company does not have R readily available!
I have been using a flow that contained the PDF reader successfully in the past 2 months but since switched laptops recently and not been able to successfully install the add-ins/packages. I am getting the below error, any advice suggestion please?
@Maky_b I appreciate that your post was a while ago, but if it's still an issue, you'll need to install pdftools (using this: https://community.alteryx.com/t5/Community-Gallery/Install-R-Packages/ta-p/878756) and possibly also update your package versions (which is described here: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/PDF-Input-tool-Error-Message/m-p/80134...)