Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF to text convert / Extraction of data from PDF files

Sumit_Das
6 - Meteoroid

Hello Everyone,

 

I need one help with relate to PDF file to Text conversion. While running a workflow I am getting below error message in tool id 2.1.4 (per workflow):

 

R (90) Error: .onLoad failed in loadNamespace() for 'pdftools', details:
R (90) Execution halted
R (90) The R.exe exit code (1) indicated an error.
 
I have attached the screenshot of workflow for your references and also mentioned the codes were used in R Tool:
 
Tool id 2.1.3 (Install PDF Tool package)

# Obtain the string of the package to download
pkgs <- 'pdftools'

# Obtain the user specified directory (which may not be used)
custom_path <- scan(what = character(), sep = "\n", nmax = 1)
%Question.custom.path%

# The set of possible repositories to use
repos <- c("http://cran.revolutionanalytics.com", "http://cran.rstudio.com")
# Select a particular repository
repo <- sample(repos, 1)

# Get the path to the library folder, starting with the default case, then the
# custom case
minor_ver <- strsplit(R.Version()$minor, "\\.")[[1]][1]
R_ver <- paste(R.Version()$major, minor_ver, sep = ".")
the_path <- paste0(normalizePath("~"), "\\R\\win-library\\", R_ver)
# Create the user's personal folder if it doesn't already exist
dir.create(the_path, recursive = TRUE, showWarnings = FALSE)
print(the_path)

subDir <- "pdftools"
output_dir <- file.path(the_path, subDir)

if (!dir.exists(output_dir)){
# Install the package to the user's private library
transcript <- capture.output(install.packages(pkgs, lib = the_path, repos = repo))
print("It did not find the directory. Installing package.")
}

 

Tool id 2.1.4 (Converting PDF to Text)

# read in the PDF file location which must
# be in a field called FullPath
data <- read.Alteryx("#1",mode="data.frame")

# Use pdf_text() function to return a character vector
# containing the text for each page of the PDF
txt <- pdftools::pdf_text(file.path(data$FullPath))

# convert the character vector to a data frame
df_txt <- data.frame(txt)

# output the data frame in steam 1
write.Alteryx(df_txt, 1)

 



Currently I am using Alteryx Version: 2021.4.2.40860 and Running Non-Elevated. I believe this is something compatibility issue with R Tool, may be I need to use previous R Tool, I mean something older version to make it work?

 

Would be really appreciated, if you can help me on this.

0 REPLIES 0
Polls
We’re dying to get your help in determining what the new profile picture frame should be this Halloween. Cast your vote and help us haunt the Community with the best spooky character.
Don’t ghost us—pick your favorite now!
Labels