ACT NOW: The Alteryx team will be retiring support for Community account recovery and Community email-change requests Early 2026. Make sure to check your account preferences in my.alteryx.com to make sure you have filled out your security questions. Learn more here
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Using tesseract package in R with Alteryx Designer

NeilFisk
9 - Comet

Hello Community,

 

I have been searching to see if there is a use case already developed for using the tesseractOCR_pdf package to extract data from a scanned PDF within Alteryx Designer so I can downstream cleanse the data with the build-in tools within Alteryx Designer.  Has anyone had any luck in using the R Tool and loading the packages to work with a scanned PDF?

 

Thanks,
Neil

1 REPLY 1
NeilFisk
9 - Comet

I may have answered my own question.  After installing the tesseract package, I placed the following code in the R Tool:

 

# read in the PDF file location which must
# be in a field called FullPath
File <- read.Alteryx("#1", mode="data.frame")

# Use pdf_text() function to return a character vector
# containing the text for each page of the PDF
Data <- tesseract::ocr(file.path(File$FullPath))

# convert the character vector to a data frame
df_Data <- data.frame(Data)

# output the data frame in steam 1
write.Alteryx(df_Data, 1)

Labels
Top Solution Authors