We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Detecting various CSV file types

Jake5
8 - Asteroid

Hello,

 

Is there a way for Alteryx to output as a field value the type of csv file of the input file?  More specifically, I would like to know if the csv type is CSV (Comma delimited), CSV UTF-8, CSV (Macintosh), or CSV (MS-DOS).  Thanks.

5 REPLIES 5
IraWatt
17 - Castor
17 - Castor

Hey @Jake5,

Interesting question, is there any way to differentiate these files in the file properties?

IraWatt_0-1667592042399.png

Could you attach a CSV (Macintosh)? To see if there is any way to tell them apart. 

 

 

Jake5
8 - Asteroid

Not sure either.  I have a CSV that originated as CSV (Comma Delimited) and did a save as CSV UTF-8 (Comma Delimited) and another as CSV (Macintosh)...but even in Windows all 3 files just show as CSV but do not call out the specific type of CSV.  

TimN
13 - Pulsar

Hi,

Take a look here.  It says in general It's not possible but you can try with Python.

 

https://chardet.readthedocs.io/en/latest/faq.html

 

 

danilang
19 - Altair
19 - Altair

Hi @Jake5 

 

You're looking at two different standards here.  Windows, Mac and Linux formats deal with line termination characters. ISO, UTF-8 and UTF-16 have to do with character encoding, i.e. the way each byte or group of bytes maps to a displayable character.   

 

For the first one, Windows line ending format is CR\LF (0D0A).  Original Mac format used CR(0D).  Linux and now Mac OS X use LF(0A).  So if you find a CR\LF pair in your file it's Windows, CR only it's original Mac and LF only it's Linux\Mac OS X. 

 

For the encoding formats, UTF-8 starts the file with the three bytes EF,BB,BF.  UTF-16 starts with FF,FE and the ISO formats don't have any leading characters. 

 

You can't use an Input Data tool to analyze these, because it performs all kinds of behind the scenes magic to figure these things out and present you with the data, without telling you the formats.  However, you can use a Blob Input, which reads the raw data of the file, followed by a Blob Convert to convert the blob to a series of hex bytes.  Then you just have look at the first few bytes to determine the encoding and then look for the various combinations of 0D0A to find the line ending format

 

danilang_0-1667740576911.png

 

 I've also attached the 4 csv files that I tried it with  

 

Dan

Jake5
8 - Asteroid

Thank you, @danilang 

 

Using your Mac.csv, I'm getting a LineEnding value of Windows which I gather is happening because it has located "ODOA" within the blob.  I was expecting it to return "Mac."  I've included a screenshot to help illustrate.  Also please clarify if your Elseif formula for the Type field should be updated from 2 to 4 as shown within my screenshot.  Thanks.  

 

 

Labels
Top Solution Authors