Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF to text convert / Extraction of data from PDF files

Sumit_Das
6 - Meteoroid

Hello Everyone,

 

I need one help with relate to PDF file to Text conversion. While running a workflow I am getting below error message in tool id 2.1.4 (per workflow):

 

R (90) Error: .onLoad failed in loadNamespace() for 'pdftools', details:
R (90) Execution halted
R (90) The R.exe exit code (1) indicated an error.
 
I have attached the screenshot of workflow for your references and also mentioned the codes were used in R Tool:
 
Tool id 2.1.3 (Install PDF Tool package)

# Obtain the string of the package to download
pkgs <- 'pdftools'

# Obtain the user specified directory (which may not be used)
custom_path <- scan(what = character(), sep = "\n", nmax = 1)
%Question.custom.path%

# The set of possible repositories to use
repos <- c("http://cran.revolutionanalytics.com", "http://cran.rstudio.com")
# Select a particular repository
repo <- sample(repos, 1)

# Get the path to the library folder, starting with the default case, then the
# custom case
minor_ver <- strsplit(R.Version()$minor, "\\.")[[1]][1]
R_ver <- paste(R.Version()$major, minor_ver, sep = ".")
the_path <- paste0(normalizePath("~"), "\\R\\win-library\\", R_ver)
# Create the user's personal folder if it doesn't already exist
dir.create(the_path, recursive = TRUE, showWarnings = FALSE)
print(the_path)

subDir <- "pdftools"
output_dir <- file.path(the_path, subDir)

if (!dir.exists(output_dir)){
# Install the package to the user's private library
transcript <- capture.output(install.packages(pkgs, lib = the_path, repos = repo))
print("It did not find the directory. Installing package.")
}

 

Tool id 2.1.4 (Converting PDF to Text)

# read in the PDF file location which must
# be in a field called FullPath
data <- read.Alteryx("#1",mode="data.frame")

# Use pdf_text() function to return a character vector
# containing the text for each page of the PDF
txt <- pdftools::pdf_text(file.path(data$FullPath))

# convert the character vector to a data frame
df_txt <- data.frame(txt)

# output the data frame in steam 1
write.Alteryx(df_txt, 1)

 



Currently I am using Alteryx Version: 2021.4.2.40860 and Running Non-Elevated. I believe this is something compatibility issue with R Tool, may be I need to use previous R Tool, I mean something older version to make it work?

 

Would be really appreciated, if you can help me on this.

0 REPLIES 0
Labels
Top Solution Authors