community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Knowledge Base

Definitive answers from Designer experts.

How to Write to Hive Faster

Alteryx
Alteryx

Hive ODBC can be slow when writing to tables. If you are looking for a faster option to write to Hive and want to create a new table or overwrite an existing table, use the IN-DB tools to output your data. 

 

The IN-DB connections have an option to write directly to HDFS using AVRO: 

DC14.png

 

When using this option to write, the data is first written to HDFS and then a corresponding table reference is created in Hive. 

  

To write out data from a workflow that doesn't already use IN-DB tools, use the Data Stream In tool

DC13.png

 

NOTE: This option does not work for updates, only when creating new tables or overwriting existing ones. It also requires the user to have access to write to HDFS.

Comments
Alteryx Partner
Henriette, When using Alteryx In database to write to Hive, it seems not to use a comma or tab delimited file format as standard. It is using some kind of return unspecified format that is creating down-stream issues for me. Can I: 1) specify the table format or underlying file type when creating table or 2) alter the file format after creating it. Thanks.
Alteryx
Alteryx

Hi @OJ_TU

We support csv and avro files to write via HDFS. Selecting csv will write out a comma delimited file. 

Atom

Hi @HenrietteH

 

In our case, we have an Excel file with +500k registers and we need to append into  Impala Parquet table.

 

In our flow we use Data stream In Tool (to move info from Excel to "In-DB" tools) and we use and In-DB Output tool with ODBC. We are messing up the cluster with the tiny files created to ODBC.

 

I have a couple of questions about your approach:

 

  1. Can we use this AVRO format with Impala tables or just Hive ones?
  2. If we use Hive tables with this Avro format, we will still need to use the  Data Stream In Tool (because Excel file is not in In-DB Tools). In this step we can spend from 1 to 3 hours. Do you have anny suggestion to improve that step?

Thanks in advance!

 

Meteoroid

There are some serious performance issues while writing to HIVE in ORC format. Any solutions available?.