I'm using the attached R tool to import PDF files. It works great. The only complaint I have is that it will not import more than one PDF file at once. It's not the end of the world if it can't - I can always append my PDFs, and then import. I'd just like to have the option. Does anyone know how to do more than one PDF at once?
Solved! Go to Solution.
My R code is pretty messy but I think:
data <- read.Alteryx("#1",mode="data.frame") df_txt <- data.frame(file = 'Top', txt = 'Top') for (fullpath in data$FullPath) { txt <- pdftools::pdf_text(file.path(fullpath)) df_new <- data.frame(file = fullpath, txt = txt) df_txt <- rbind(df_txt, df_new) } df_txt <- df_txt[-1,] write.Alteryx(df_txt, 1)
Should read a list and write out as a big block with filename column
Another (Messy) way of doing this is to read in the data frame in chunks. I've included some example code here which works. This essentially turns your R code into a batch macro that runs once per line.
data <- read.Alteryx.First("#1", 1, mode="data.frame") while (!is.null(data)) { write.Alteryx(data, 2) # Use pdf_text() function to return a character vector # containing the text for each page of the PDF txt <- pdftools::pdf_text(file.path(data$FullPath)) # convert the character vector to a data frame df_txt <- data.frame(txt) # output the data frame in steam 1 write.Alteryx(df_txt, 1) data <- read.Alteryx.Next("#1", mode="data.frame") }
Pretty sure there is a way to simplify this and do the sequential write inside a loop without chunking the initial input, but I was having trouble getting that set up (R is not my most used language).
I think this one is a little over my head. I don't know what to do after brining in the macro.