Hey All --
I am working on a workflow that will read in batches of PDF files, save a copy using the Run command, then uses the R tool to read in those PDFs so they can be parsed. I can just use the PDF input tool from my desktop to achieve this, but I need it on our server, and the PDF input tool does not work on the server. I can do this just fine if I only have one input, but I would like to have the ability to read in up to five (or indefinite, whatever works). The problem I am running into is that the R tool throws an error if there are no records that flow through it, and that is an issue when trying to run on the server.
Does anyone know of a workaround so that if there is less than 5 inputs, the workflow will still run without an error?
Here is how I have my R Tool configured (I'm not 100% clear on what is going on, as i just pulled this from the community)
Any help would be greatly appreciated!
Solved! Go to Solution.
Hi @RobMotiwalla ,
a loop can solve this problem for you. You can input a list of file paths of your PDFs and all of them will be read in.
This can be an Alteryx Batch Macro or a for-loop in R. See example attached.
Here is the R Code for reference:
data <- read.Alteryx("#1", mode="data.frame")
#prepare empty data frame
txt = data.frame()
#run for loop over PDF paths
for (row in 1:nrow(data)) {
temp <- pdftools::pdf_text(file.path(data[row, "FullPath"])) #PDF Input
df <- data.frame(temp)
txt <- rbind(txt, df) #union results
}
write.Alteryx(txt, 1)
Please mark this as the solution if it answers your question, it will help others to find solutions quicker.
Kind Regards,
Kilian
Solutions Engineer - Alteryx
@KilianL Thank you so much, that worked perfectly!