In case you missed the announcement: The Alteryx One Fall Release is here! Learn more about the new features and capabilities here
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Re: Inputting Data in Chinese, Japanese and Korean Characters

Marcel_Gavrila
8 - Asteroid

Hello,

 

Do you now how can read CJK from PDF? I have a R code, but since I am new with R I don't know how to amend it in order to transform data in Unicode. below is my code.

 

cond.install <- function(package.name){
options(repos = "http://cran.rstudio.com") #set repo
#check for package in library, if package is missing install
if(package.name%in%rownames(installed.packages())==FALSE) {
install.packages(package.name)}else{require(package.name, character.only = TRUE)}}

cond.install("pdftools")
cond.install("tesseract")

file <- "C:\\Users\\PDF\\file.pdf
pngfile <- pdftools::pdf_convert(file,dpi = 200)
text <- tesseract::ocr(pngfile)
write.Alteryx(text, 1)

write.Alteryx(file,2)

 

Thnak you,

 

Marcel

0 REPLIES 0
Labels
Top Solution Authors