Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Importing multiple PDF files at once using R tool

oracleoftemple
9 - Comet

I'm using the attached R tool to import PDF files.  It works great.  The only complaint I have is that it will not import more than one PDF file at once.  It's not the end of the world if it can't - I can always append my PDFs, and then import.  I'd just like to have the option.  Does anyone know how to do more than one PDF at once?

4 REPLIES 4
JoeS
Alteryx
Alteryx

I have converted your process into a batch macro.

 

You can then look to send through a list of parts and its going to return each through.

 

You may want to add more parsing within the macro, but hopefully that will get you started.

 

 

jdunkerley79
ACE Emeritus
ACE Emeritus

My R code is pretty messy but I think:

data <- read.Alteryx("#1",mode="data.frame")

df_txt <- data.frame(file = 'Top', txt = 'Top')

for (fullpath in data$FullPath) {
	txt <- pdftools::pdf_text(file.path(fullpath))
	df_new <- data.frame(file = fullpath, txt = txt)
	df_txt <- rbind(df_txt, df_new)
}
df_txt <- df_txt[-1,]

write.Alteryx(df_txt, 1)

Should read a list and write out as a big block with filename column

Claje
14 - Magnetar

Another (Messy) way of doing this is to read in the data frame in chunks.  I've included some example code here which works.  This essentially turns your R code into a batch macro that runs once per line.

 

data <- read.Alteryx.First("#1", 1, mode="data.frame")
while (!is.null(data))
{
    write.Alteryx(data, 2)
# Use pdf_text() function to return a character vector
# containing the text for each page of the PDF
txt <- pdftools::pdf_text(file.path(data$FullPath))
 
# convert the character vector to a data frame
df_txt <- data.frame(txt)
 
# output the data frame in steam 1
write.Alteryx(df_txt, 1)


	data <- read.Alteryx.Next("#1", mode="data.frame")
}



Pretty sure there is a way to simplify this and do the sequential write inside a loop without chunking the initial input, but I was having trouble getting that set up (R is not my most used language).

oracleoftemple
9 - Comet

I think this one is a little over my head.  I don't know what to do after brining in the macro.

Labels