Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Add Parquet data format as input & output

Please add Parquet data format (https://parquet.apache.org/) as read-write option for Alteryx.

 

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

 

Thank you.

 

Regards,

Cristian.

21 Comments
SandeepChayanam
7 - Meteor

Parquet format is getting increasingly popular in the hadoop world day by day.Ability to read/write this format from/onto NFS, HDFS storage adds a lot of value to the product.

 

Thanks,

Sandeep.

badun
6 - Meteoroid

Hi Guys.

 

Any progress since 2015?

 

You'll really benefit of supporting parquet for BDE.

SeanAdams
17 - Castor
17 - Castor

Hey all,

Not sure about Parquet directly, but we have successfully tested using a Kudu database (which is also a columnar database in the Apache stack) and also using Spark SQL.   That may give you a route into Parquet?

Rajabhathor
5 - Atom

Hi we are just now evaluating Alteryx and I was curious as to how to add parquet as a file input/output format?

 

Thanks

DurgaS
Alteryx Alumni (Retired)

Hi @Rajabhathor,

 

Data in parquet format can be stored in hive tables and accessed from the Alteryx Designer via the hive ODBC driver.


Create a table in hive with "STORED AS PARQUET" for hive 0.13 and later.
Alteryx can read and write data from these tables with the hive ODBC driver.

Check the create table syntax in this article


For files already stored in the "PARQUET" format in HDFS, use "LOAD DATA" to load the data in the HDFS file to a table in hive.


To write results of an Alteryx workflow back to a hive table in the PARQUET format, use ""hive.default.fileformat=PARQUET” in the Server Side Properties ODBC driver configuration

 

Hope these help.

badun
6 - Meteoroid

Thank you, Durga S.

Unfortunately, this is a very weak suggestion. ODBC is capable of a simple things but the need is to upload cca 50 Gb file in compressed columnar storage format.

Julien_B
8 - Asteroid

Hi all, 

 

Do you have any updates about the subject ? We are looking forward to be able to read/write easily Parquet data format !

 

 

ARich
Alteryx Alumni (Retired)
Status changed to: Not Planned

Hi,

 

Thanks for the idea. Other than the ODBC option mentioned by Durga we don't have plans to add parquet support as our engine is not optimized to handle columnar data at this time.

 

Best,

Alex

Cristian
9 - Comet

"as our engine is not optimized to handle columnar data at this time"

 

https://github.com/elastacloud/parquet-dotnet

Runs on all flavors of Windows, Linux, and mobile devices (iOS, Android) via Xamarin

 

 

RM1
5 - Atom

@DurgaS,

 

We tried setting up server side properties for PARQET but somehow its not working Alteryx still creating table with Text format and not Parquet. We are using Hive 1.2 version. I tried writing in tables using both way (through IN DB and through Out Tool).

 

Can you suggest if any thing else need to change.