Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

How to write Data faster in Hive

jainrishabh
6 - Meteoroid

Hi Team,

 

I am trying to build a alteryx designer workflow where i am transforming the data in desired format and then we have to write the data on Hive.

 

We are dealing with large amount of data for eg: After transformation we want to upload millions of records after each run. To work with Hive connection we are using ODBC connection and Drivers used Hortonworks Hive ODBC Connection.

 

We have tried many possibilities and after every approach, it takes 24 – 48 hours and sometimes even more to write data on Hive.

 

Problem Statement: We want to upload data from one Database to another(Hive)

 

Approach 1: We are fetching data from Input tool and with the help of Data stream in tool we write data on HIve

 

jainrishabh_1-1622719443689.png

 

Approach 2:

Step 1: Fetching data from one database using Input/Connect in-DB Tool and stored in .avro file

Step 2: Using the same .avro file as an input and then storing the data in Hive using Output/write in DB tool

 

All the above approaches takes a long time to write data on Hive

 

While creating connection for Read and Write on ODBC we are using Hive ODBC as connection. Below connection is used for both READ and WRITE connection.

 

jainrishabh_2-1622719501796.png

 

Can someone please help me on the same.

 

Thanks in advance

Rishabh

1 REPLY 1
simonaubert_bd
13 - Pulsar

Hello @jainrishabh 

You should use HDFS to write faster data in hive.
ex  https://community.alteryx.com/t5/Engine-Works/Hadoop-Performance-Considerations/ba-p/486456

 

works for in-memory and in-database.

Best regards,

Simon

Labels