This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Please add Parquet data format (https://parquet.apache.org/) as read-write option for Alteryx.
Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
Parquet format is getting increasingly popular in the hadoop world day by day.Ability to read/write this format from/onto NFS, HDFS storage adds a lot of value to the product.
Any progress since 2015?
You'll really benefit of supporting parquet for BDE.
Not sure about Parquet directly, but we have successfully tested using a Kudu database (which is also a columnar database in the Apache stack) and also using Spark SQL. That may give you a route into Parquet?
Hi we are just now evaluating Alteryx and I was curious as to how to add parquet as a file input/output format?
Data in parquet format can be stored in hive tables and accessed from the Alteryx Designer via the hive ODBC driver.
Create a table in hive with "STORED AS PARQUET" for hive 0.13 and later.Alteryx can read and write data from these tables with the hive ODBC driver.
Check the create table syntax in this article
For files already stored in the "PARQUET" format in HDFS, use "LOAD DATA" to load the data in the HDFS file to a table in hive.
To write results of an Alteryx workflow back to a hive table in the PARQUET format, use ""hive.default.fileformat=PARQUET” in the Server Side Properties ODBC driver configuration
Hope these help.
Thank you, Durga S.
Unfortunately, this is a very weak suggestion. ODBC is capable of a simple things but the need is to upload cca 50 Gb file in compressed columnar storage format.
Do you have any updates about the subject ? We are looking forward to be able to read/write easily Parquet data format !
Thanks for the idea. Other than the ODBC option mentioned by Durga we don't have plans to add parquet support as our engine is not optimized to handle columnar data at this time.
"as our engine is not optimized to handle columnar data at this time"
Runs on all flavors of Windows, Linux, and mobile devices (iOS, Android) via Xamarin
We tried setting up server side properties for PARQET but somehow its not working Alteryx still creating table with Text format and not Parquet. We are using Hive 1.2 version. I tried writing in tables using both way (through IN DB and through Out Tool).
Can you suggest if any thing else need to change.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.