Does anyone know how to check in the workflow that a CSV input file is a CSV UTF8 file format?
I need to detect this and raise a warning that special characters may have been lost from one or more input files generated by various parties.
Many thanks
Presumably you're reading in all your files with the Code Page = "UTF-8", but want to know if previous workflows/file generators have used the wrong encoding? It only matters if 'special' characters are being used. I think these will turn up as question marks (may be somebody can confirm this) So you could look for "?" where there shouldn't be any. That depends on the data, and knowing how you might identify false positives (i.e. question marks that are actually marking the end of a question). An example might help here.
Hi @F_Latulipe
Interesting question, I found this thread in the community that may help you How to check for encoding or formatting issues with Excel worksheets
hth
Arnaldo
Thank you for that @PhilipMannering and @ArnaldoSandoval. The files are currently being generated by a Microsoft Power Automate script and saved on a Teams Channel. One of the columns is an Open Comment which can legitimately include '?' as well as special characters. The script generates the files in UTF8 format correctly.
Another team is planning to provide me with input files in a similar fashion although they will be creating the input files from excel (i.e. manually).
My workflow downloads the files daily from the channel then processes each in turn, and for each row tries to log warnings / errors re formatting issues (based on agreed rules). I have no issue opening the files and reading them, Alteryx treats CSV and CSV UTF8 files the same way.
What I need to be able to check is if each file is indeed of CSV UTF8 format - so that I can automatically raise a warning / email re that the Open Comment column may have lost special characters. i.e. detect and flag the Human error that the file type is not the one expected.
Hi @F_Latulipe
I searched for a command line enconding checker finding these options:
Questions:
Do you have access to developers with CSharp (C#) knowledge? having the source code, this utility could be enhanced in different ways, like creating a console version or adding an ouput option.
Let me know
hth
Arnaldo
Thank you both, the environment we use is Microsoft based, so no native Unix commands for me. But I will experiment with the Python chardet module.