This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Parquet is a columnar storage format used primarily in the Hadoop ecosystem. This differs from the traditional row oriented approach. The usage of a columnar storage format makes the data more homogeneous and thus allows for better compression.
Data in Parquet format can be accessed via the Hive ODBC driver.
Product - Alteryx Designer
3rd party Product - Access to Hive database (version 0.13 or higher)
Create a table in your Hive database with "STORED AS PARQUET" for hive 0.13 and later. The create table syntax is explained in this article. See also below for it.
CREATE TABLE parquet_test (
PARTITIONED BY (part string)
STORED AS PARQUET;
Please note that for this to work the correct permissions to create a table must have be set up.
Alternatively, find attached a solution example as Analytic App that uses the Post Create SQL Statement to alter the table to Parquet format:
ALTER TABLE table_name SET FILEFORMAT PARQUET;
Please note that if you want to use the attached example you will have to adjust the Connection String and input file name to your use case.
Access Data already stored in Parquet format
For data already stored in the Parquet format in HDFS, use the LOAD DATA statement to load the data in the HDFS file to a table in Hive.
Designer can read and write data from these tables via the Hive ODBC driver.
Writing data from Designer to new Hive table in Parquet
To write results of an Alteryx workflow back to a new Hive table in the Parquet format, use hive.default.fileformat=PARQUET in the Server Side Properties ODBC driver configuration under "Advanced Options". Please note that for this the database server needs to be accessed to make the change. Typically, DBA involvement is required.