Hello,
Is there a way for Alteryx to output as a field value the type of csv file of the input file? More specifically, I would like to know if the csv type is CSV (Comma delimited), CSV UTF-8, CSV (Macintosh), or CSV (MS-DOS). Thanks.
Hey @Jake5,
Interesting question, is there any way to differentiate these files in the file properties?
Could you attach a CSV (Macintosh)? To see if there is any way to tell them apart.
Not sure either. I have a CSV that originated as CSV (Comma Delimited) and did a save as CSV UTF-8 (Comma Delimited) and another as CSV (Macintosh)...but even in Windows all 3 files just show as CSV but do not call out the specific type of CSV.
Hi,
Take a look here. It says in general It's not possible but you can try with Python.
https://chardet.readthedocs.io/en/latest/faq.html
Hi @Jake5
You're looking at two different standards here. Windows, Mac and Linux formats deal with line termination characters. ISO, UTF-8 and UTF-16 have to do with character encoding, i.e. the way each byte or group of bytes maps to a displayable character.
For the first one, Windows line ending format is CR\LF (0D0A). Original Mac format used CR(0D). Linux and now Mac OS X use LF(0A). So if you find a CR\LF pair in your file it's Windows, CR only it's original Mac and LF only it's Linux\Mac OS X.
For the encoding formats, UTF-8 starts the file with the three bytes EF,BB,BF. UTF-16 starts with FF,FE and the ISO formats don't have any leading characters.
You can't use an Input Data tool to analyze these, because it performs all kinds of behind the scenes magic to figure these things out and present you with the data, without telling you the formats. However, you can use a Blob Input, which reads the raw data of the file, followed by a Blob Convert to convert the blob to a series of hex bytes. Then you just have look at the first few bytes to determine the encoding and then look for the various combinations of 0D0A to find the line ending format
I've also attached the 4 csv files that I tried it with
Dan
Thank you, @danilang
Using your Mac.csv, I'm getting a LineEnding value of Windows which I gather is happening because it has located "ODOA" within the blob. I was expecting it to return "Mac." I've included a screenshot to help illustrate. Also please clarify if your Elseif formula for the Type field should be updated from 2 to 4 as shown within my screenshot. Thanks.
User | Count |
---|---|
63 | |
28 | |
23 | |
23 | |
22 |