Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Upload Parquet to HDFS

JonnyR
7 - Meteor

I know from other peoples posts on here that Alteryx doesn't natively support outputting columnar data such as parquet files. I was wondering if anybody has created a workaround for this for uploading to hdfs. We're using Alteryx v11.7, so ideally a solution which uses the capabilities of this version but I'm also interested in the best method going forward as we update Alteryx. Are there any clever shenanigans which can be done using the run command tool?

Thanks for your help!

4 REPLIES 4
lepome
Alteryx Alumni (Retired)

I think Durga's article has been reattributed here.

Lisa LePome
Principal Support Engineer -- Knowledge Management Coach
Alteryx, Inc.
Mogul1
6 - Meteoroid

For some reason, I am getting an "Access Denied" message when I try to access that link.  Do you know why that might be?

Thanks in advance.

lepome
Alteryx Alumni (Retired)

@Mogul1   Yes, that article was archived because it was found to be inaccurate.  I'm sorry to be the bearer of bad news.  I had tried to do something similar and mostly succeeded.  Here is a part of what worked for me.  It is not officially supported at the moment.

To create the table:
Server-side PropertiesServer-side PropertiesScreenshot of the workflow I used to Create the tableScreenshot of the workflow I used to Create the table

Text in the Comment tool:

This is a test workflow to create to a parquet table

To use it, follow these steps:
Make sure that you have an ODBC connection to Hive with the following settings in Server Side Properties:

hive.default.fileformat=parquet
hive.exec.dynamic.partition=true
hive.exec.dynamic.partition.mode=nonstrict

In the Input Data tool replace the connection that's there with your own.

Replace the Post-SQL statement with one appropriate to your use case:

Set the tablename you want to create and set the partition columns separately from the rest of the data in the PARTITIONED BY argument. If you have more than one column that the table is partitioned by, then list them in order in the field separated by commas.

Set the table to open as desired. You will not use the data here. Alteryx fails to execute Post-SQL statement if no data at all are returned, so you need something, but anything at all will do.

 

Text version of the Post SQL statement I used:
CREATE TABLE parquet_test (storenum int, address string, city string, state string, zip string, lat double, lon double, county string, region string) PARTITIONED BY (type string) STORED AS PARQUET;

 

Screenshot of Input Data ToolScreenshot of Input Data Tool

 

To append:
Screenshot of workflow to append to tableScreenshot of workflow to append to table

 

 

Text in Comment tool:

This is a test workflow to append to a parquet table

To use it, follow these steps:
Make sure that you have an ODBC connection to Hive with the following settings in Server Side Properties:

hive.default.fileformat=parquet
hive.exec.dynamic.partition=true
hive.exec.dynamic.partition.mode=nonstrict

Open up the macro and replace the connection that's there with your own. If you change the name of the temporary table from lltest to something else, you will also have to change the Post-SQL statement correspondingly.

Save the macro with the new data connection.

The macro should then update in the workflow.

Set the tablename you wish to append in the Text Input tool, and set the partition columns there as well. If you have more than one column that the table is partitioned by, then list them in order in the field separated by commas.

The partition columns must be the last fields coming into the macro, so if they were not already, you'll want to use a Select tool before the macro.

For this test, you can just select sample data you wish to append to the target table. Replace the Input Data tool with whatever you need to supply that data.

 

Configuration of Macro's Post-SQL statementConfiguration of Macro's Post-SQL statement

text of the Post-Create SQL Statement I used:  INSERT INTO parquet_test PARTITION (type) Select * FROM lltest;
DROP TABLE lltest;Action tool configAction tool config

Other action tool configOther action tool config

I wish you the best of luck.

Lisa LePome
Principal Support Engineer -- Knowledge Management Coach
Alteryx, Inc.
Labels
Top Solution Authors