Hello,
Do you now how can read CJK from PDF? I have a R code, but since I am new with R I don't know how to amend it in order to transform data in Unicode. below is my code.
cond.install <- function(package.name){
options(repos = "http://cran.rstudio.com") #set repo
#check for package in library, if package is missing install
if(package.name%in%rownames(installed.packages())==FALSE) {
install.packages(package.name)}else{require(package.name, character.only = TRUE)}}
cond.install("pdftools")
cond.install("tesseract")
file <- "C:\\Users\\PDF\\file.pdf
pngfile <- pdftools::pdf_convert(file,dpi = 200)
text <- tesseract::ocr(pngfile)
write.Alteryx(text, 1)
write.Alteryx(file,2)
Thnak you,
Marcel