Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Encoding Error in R tool

Toshi92
8 - Asteroid

I want to use R-tools for files in Japanese, encode is Shift-JIS.

I think I have to use encoding='cp932' parameter in R script.

But it doesn't work by my trials...

 

20180119.PNG

 

I attached my flow.

Would someone please check my workflow?

 

Thanks,

 

Toshi

21 REPLIES 21
SydneyF
Alteryx Alumni (Retired)

Hi @Toshi92

 

I believe the encoding errors you are experiencing are related to how Windows handles the generic read and write functions in R. If you are interested, you can read more about it here

 

Can you please tell me about your Operating System? Which Windows version are you using?  Which Windows language does it use (Japanese, English?)?

 

I've developed a work-around that reads in and writes out your data to the R tool using a R package, readr. It is not the most beautiful workaround, but it works. In order for it to run on your machine, you will need to go into the R tool and change the setwd() to the filepath your workflow is saved in.

 

2018-01-29_9-27-40.png

 

 

 

The output will be saved in the same folder as outputtest.csv. When you read in this file to Alteryx, you will need to change the input setting to UTF-8.

 

2018-01-29_9-28-33.png

 

 

You could edit any of the R based tools to work this way, you will just need to add an output tool and a block until done tool, so that the input to the R tool is written out and of Alteryx and read back in with the readr package. 

 

Does this all make sense? Are there any questions I can answer for you? Please let me know!

Toshi92
8 - Asteroid

Hi, SydneyF

 

My OS is Windows 7 Pro, the language is Japanese.

 

I changed the file path, but I still have the error below.

 

20180130-1.PNG

 

The "readr" library was downloaded correctly by your flow.

But, there is an encoding error when I call out "readr" only.

 

20180130-2.PNG

 

Could you confirm it?

 

 

Toshi

 

SydneyF
Alteryx Alumni (Retired)

Hi @Toshi92

 

Is it possible this message is just saying "package 'readr; was build under R version 3.3.3" I get this message when I load in readr, but it does not necessarily imply there will be an impact on the functionality of the package. I am still able to use the functions of the package without issue.

 

2018-01-31_8-14-30.png

 

Are you still able to run the readr functions (read_csv and write_csv) with the readr package loaded?

 

Thank you!

Toshi92
8 - Asteroid

Hi, SydneyF

 

Thank you for your support.

 

There is still one error.

20180201-1.PNG

 

 

The data inserted to R tool has no problem. Column names are displayed in Japanese.

20180201-2.PNG

 

 

 

Regards,

 

Toshi,

SydneyF
Alteryx Alumni (Retired)

Hi @Toshi92

 

I believe this error is related to the configuration of the read.Alteryx.First() function in the script you are referencing. One of the arguments in the function (%Question.chunk.size%) is calling an environmental constant, which is set by an interface tool in the macro the code is lifted from, but is absent in the workflow you are working with. If you change this argument to a number (e.g., 500) then the script will run successfully, however, the output will have encoding errors. 

 

The reason I suggest using the readr package is to circumvent having the Japanese characters read-in and out of R using the generic read/write functions, which are not handled appropriately by Windows. In the code I posted, you do not use the read.Alteryx functions at all, rather you export your data to a .csv and read in the .csv using the readr function read_csv, which is better equipped to handle the encoding. 

 

Thank you!

Toshi92
8 - Asteroid

Hi, @SydneyF

I'm afraid I attached the wrong picture, it was my old flow...

This is new one, but there is same issue.

20180202-1.PNG

 

The temporary csv is outputted correctly...

20180202-2.PNG

 

 

Regards,

 

Toshi,

 

SydneyF
Alteryx Alumni (Retired)

Hi @Toshi92

 

It looks like you have backslashes in your working directory path - R only accepts front slashes. Can you please try pasting following code into the R Tool in your workflow and let me know if it works for you?

 

#set wd to read from/write to
setwd("C:/Test/EncodingError_WorkAround")

# helper function to install R packages. source_location optional
install_package <- function(package_name, source_location = FALSE) {
	# grab the alteryx repo
	altx.repo <- getOption("repos")
	# set your primary repo if you haven't already
	altx.repo["CRAN"] <- "http://cran.rstudio.com"  
	options(repos = altx.repo)
	#check to see if the package we want is already installed
	# if it is, we will just load it, otherwise we will install it
	if(package_name %in% rownames(installed.packages())==FALSE){ 
		if(source_location==FALSE) {
			install.packages(package_name)
		} else {
			install.packages(source_location, repos = NULL, type = "source")
		}
	}
	require(package_name, character.only=TRUE)
}

# install readr
install_package("readr")

library(readr) X <- as.matrix(read_csv("japanesetest.csv"), locale = locale(encoding = "UTF-8")) X[is.na(X)] <- 0 n_rows <- nrow(X) iota <- matrix(1, nrow = 1, ncol = n_rows) XtX <- t(X) %*% X Y <- iota %*% X # Calculate the similarity or distance matrix the_matrix <- XtX/(sqrt(t(Y) %*% Y)) the_matrix <- cbind(colnames(the_matrix), the_matrix) colnames(the_matrix)[1] <- "rownames" write_csv(as.data.frame(the_matrix), "outputtest.csv")

Thanks!

 

 

Toshi92
8 - Asteroid

Hi, SydneyF

 

Thank you for your kindness.

But it doesn't finished...

 

Messages is below.

20180205-1.PNG

 20180205-2.PNG

 

 

 

I attached "outputtest.csv" produced new workflow.

It was collapsed with encoding errors...20180205-3.PNG

 

 

 

 

I think we are in the last one mile...

 

 

Toshi

 

SydneyF
Alteryx Alumni (Retired)

Hi @Toshi92,

 

Do you still have a read.Alteryx() function included in your script?  I suspect that this might be the cause of the R Tool conversion message you are seeing.

 

However, it appears to me that the script itself is running as expected. When I load the file you attached into Alteryx with Unicode UTF-8 selected as the Code Page option, it appears to be read-in properly. 

 

2018-02-05_8-23-41.png

 

 

I am also able to get the file read in properly in Excel by Importing the Data > From Text and selecting 65001: Unicode (UTF-8) as the File Origin 

 

2018-02-05_8-28-46.png

The write_csv function used by the script encodes all columns as UTF-8. Will this work for you, or do you need the encoding to be Shift-JIS? Please let me know.

 

Thank you!

 

SydneyF

 

 

 

 

Labels