UTF-8 CSV output without BOM?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This question is related to the behavior of the Output Data tool.
When creating CSV output with Code Page option set to Unicode UTF-8 the output file contains a byte order sequence (BOM) as the first character. This sequence is used to explicitly indicate the endianess of the text. Many consider the inclusion of this in the file to be bad practice. Is there a way to configure Alteryx to not output the BOM? If not, could a feature be considered to allow the user to toggle?
Here is more information about BOM:
http://en.wikipedia.org/wiki/Byte_order_mark
http://stackoverflow.com/questions/2223882/whats-different-between-utf-8-and-utf-8-without-bom
This is causing some pain as I am creating very large text files in UTF8 format (many GB) and having to do a post-Alteryx scripting process to remove the BOM.
Solved! Go to Solution.
- Labels:
- Output
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Obviously you need to be careful when you read this CSV file back in, it won't be able to auto-detect the code page any more. You will need to specify UTF-8 on read.
I made a module that demonstrates, and also a macro that combines the logic into a simple tool for writing a UTF-8 file with no BOM.
https://www.dropbox.com/s/rsyoksoxf372050/CSV_BOM.yxzp?dl=0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'd love to see this option as well as we have the same issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @ben_stroud,
Good news! We're releasing with 11.0 the ability to ouput a .csv file that uses UTF-8 without a byte order
mark (BOM) via a new option in the Output Data tool. You will also be able to read the .csv file without the BOM via an input tool by selecting the UTF-8 code page.
We hope this helps with the issue you described!
- @KatieH
Sr. Manager, Product Management - Alteryx Designer & Visualytics
Alteryx
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Now if we can just add the ability to put in a header record and trailer record in the output my life will be much easier!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'm having a similar problem now with JSON files: UTF-8 encoding is writing the BOM, and a client language doesn't like it.
Is there plans to roll the "Include BOM" toggle for other file formats?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I would also like that it will be an available setting as such in the Output file, without need for workarounds.
