How can you double check that a CSV input file is a UTF8 CSV file?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Does anyone know how to check in the workflow that a CSV input file is a CSV UTF8 file format?
I need to detect this and raise a warning that special characters may have been lost from one or more input files generated by various parties.
Many thanks
- Labels:
- Input
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Presumably you're reading in all your files with the Code Page = "UTF-8", but want to know if previous workflows/file generators have used the wrong encoding? It only matters if 'special' characters are being used. I think these will turn up as question marks (may be somebody can confirm this) So you could look for "?" where there shouldn't be any. That depends on the data, and knowing how you might identify false positives (i.e. question marks that are actually marking the end of a question). An example might help here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @F_Latulipe
Interesting question, I found this thread in the community that may help you How to check for encoding or formatting issues with Excel worksheets
hth
Arnaldo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you for that @PhilipMannering and @ArnaldoSandoval. The files are currently being generated by a Microsoft Power Automate script and saved on a Teams Channel. One of the columns is an Open Comment which can legitimately include '?' as well as special characters. The script generates the files in UTF8 format correctly.
Another team is planning to provide me with input files in a similar fashion although they will be creating the input files from excel (i.e. manually).
My workflow downloads the files daily from the channel then processes each in turn, and for each row tries to log warnings / errors re formatting issues (based on agreed rules). I have no issue opening the files and reading them, Alteryx treats CSV and CSV UTF8 files the same way.
What I need to be able to check is if each file is indeed of CSV UTF8 format - so that I can automatically raise a warning / email re that the Open Comment column may have lost special characters. i.e. detect and flag the Human error that the file type is not the one expected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @F_Latulipe
I searched for a command line enconding checker finding these options:
- file.exe is a native unix console command; are you using unix?
- I found Encoding Checker at GitHub, I downloaded and tested on my laptop, below its output:
- This utility is a promising option.
- Unfortunatelly, it does not allow to write its output to disk, so we could use it in Alteryx
- The good news is the source code is available in GitHub.
Questions:
Do you have access to developers with CSharp (C#) knowledge? having the source code, this utility could be enhanced in different ways, like creating a console version or adding an ouput option.
Let me know
hth
Arnaldo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you both, the environment we use is Microsoft based, so no native Unix commands for me. But I will experiment with the Python chardet module.
