Early bird tickets for Inspire 2023 are now available! Discounted pricing closes on January 31st. Save your spot!

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer and Intelligence Suite.
SOLVED

UTF-8 CSV output without BOM?

ben_stroud
7 - Meteor

This question is related to the behavior of the Output Data tool.

When creating CSV output with Code Page option set to Unicode UTF-8 the output file contains a byte order sequence (BOM) as the first character. This sequence is used to explicitly indicate the endianess of the text. Many consider the inclusion of this in the file to be bad practice. Is there a way to configure Alteryx to not output the BOM? If not, could a feature be considered to allow the user to toggle?

Here is more information about BOM:

http://en.wikipedia.org/wiki/Byte_order_mark
http://stackoverflow.com/questions/2223882/whats-different-between-utf-8-and-utf-8-without-bom

This is causing some pain as I am creating very large text files in UTF8 format (many GB) and having to do a post-Alteryx scripting process to remove the BOM.

8 REPLIES 8
Ned
Alteryx Alumni (Retired)
As you know the CSV driver puts the BOM there and there isn't any way to get rid of it in that tool, but its easy to do the conversion manually and write without it.  Using a multi-field formula, you can convert all your fields to UTF-8 and then leave them narrow.  Then you write a CSV and pick latin-1 (a little white lie) which is the default narrow code page in Alteryx, so it means there is no conversion. 

Obviously you need to be careful when you read this CSV file back in, it won't be able to auto-detect the code page any more.  You will need to specify UTF-8 on read.

I made a module that demonstrates, and also a macro that combines the logic into a simple tool for writing a UTF-8 file with no BOM.

https://www.dropbox.com/s/rsyoksoxf372050/CSV_BOM.yxzp?dl=0
 
ben_stroud
7 - Meteor
Good workaround. Thanks
ben_stroud
7 - Meteor
Following up on this. The workaround solution seemed to work, but due to the data size and large number of fields, took much longer to process versus just using the output tool (many hours in this case). Please consider adding a "UTF-8 (without BOM)" encoding option to the output tool in future releases. It would be greatly appreciated as we are needing the functionality when churning through around 600GB of very wide records (~2000 fields).
jgreene
8 - Asteroid

I'd love to see this option as well as we have the same issue.

KatieH
Alteryx Alumni (Retired)

Hey @ben_stroud,

Good news! We're releasing with 11.0 the ability to ouput a .csv file that uses UTF-8 without a byte order

mark (BOM) via a new option in the Output Data tool. You will also be able to read the .csv file without the BOM via an input tool by selecting the UTF-8 code page. 

 

We hope this helps with the issue you described!

@KatieH

Katie Haralson
Sr. Manager, Product Management - Alteryx Designer & Visualytics
Alteryx
NJT
11 - Bolide

Now if we can just add the ability to put in a header record and trailer record in the output my life will be much easier!

Shlomo
5 - Atom

I'm having a similar problem now with JSON files: UTF-8 encoding is writing the BOM, and a client language doesn't like it.

 

Is there plans to roll the "Include BOM" toggle for other file formats?

I would also like that it will be an available setting as such in the Output file, without need for workarounds.

Labels