Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

How can you double check that a CSV input file is a UTF8 CSV file?

F_Latulipe
6 - Meteoroid

Does anyone know how to check in the workflow that a CSV input file is a CSV UTF8 file format?

 

I need to detect this and raise a warning that special characters may have been lost from one or more input files generated by various parties.

 

Many thanks

6 REPLIES 6
PhilipMannering
16 - Nebula
16 - Nebula

Presumably you're reading in all your files with the Code Page = "UTF-8", but want to know if previous workflows/file generators have used the wrong encoding? It only matters if 'special' characters are being used. I think these will turn up as question marks (may be somebody can confirm this) So you could look for "?" where there shouldn't be any. That depends on the data, and knowing how you might identify false positives (i.e. question marks that are actually marking the end of a question). An example might help here.

ArnaldoSandoval
12 - Quasar

Hi @F_Latulipe 

 

Interesting question, I found this thread in the community that may help you How to check for encoding or formatting issues with Excel worksheets 

 

hth

Arnaldo

F_Latulipe
6 - Meteoroid

Thank you for that @PhilipMannering and @ArnaldoSandoval.   The files are currently being generated by a Microsoft Power Automate script and saved on a Teams Channel. One of the columns is an Open Comment which can legitimately include '?' as well as special characters. The script generates the files in UTF8 format correctly.

Another team is planning to provide me with input files in a similar fashion although they will be creating the input files from excel (i.e. manually).

 

My workflow downloads the files daily from the channel then processes each in turn, and for each row tries to log warnings / errors re formatting issues (based on agreed rules).  I have no issue opening the files and reading them, Alteryx treats CSV and CSV UTF8 files the same way.   

 

What I need to be able to check is if  each file is indeed of CSV UTF8 format - so that I can automatically raise a warning / email re that the Open Comment column may have lost special characters.  i.e. detect and flag the Human error that the file type is not the one expected.

PhilipMannering
16 - Nebula
16 - Nebula

I don't think there's a way you can know for sure. You chould try using Python's chardet module. See example attached,

PhilipMannering_0-1681465805127.png

 

ArnaldoSandoval
12 - Quasar

Hi @F_Latulipe 

 

I searched for a command line enconding checker finding these options:

 

  • file.exe is a native unix console command; are you using unix?
  • I found Encoding Checker at GitHub, I downloaded and tested on my laptop, below its output:

File-Encoding-Checker-01.png

  • This utility is a promising option.
  • Unfortunatelly, it does not allow to write its output to disk, so we could use it in Alteryx
  • The good news is the source code is available in GitHub.

Questions:

Do you have access to developers with CSharp (C#) knowledge? having the source code, this utility could be enhanced in different ways, like creating a console version or adding an ouput option.

 

Let me know

 

hth

Arnaldo

 

F_Latulipe
6 - Meteoroid

Thank you both, the environment we use is Microsoft based, so no native Unix commands for me.  But I will experiment with the Python chardet module.

Labels