Hi Team,
I am trying to build a alteryx designer workflow where i am transforming the data in desired format and then we have to write the data on Hive.
We are dealing with large amount of data for eg: After transformation we want to upload millions of records after each run. To work with Hive connection we are using ODBC connection and Drivers used Hortonworks Hive ODBC Connection.
We have tried many possibilities and after every approach, it takes 24 – 48 hours and sometimes even more to write data on Hive.
Problem Statement: We want to upload data from one Database to another(Hive)
Approach 1: We are fetching data from Input tool and with the help of Data stream in tool we write data on HIve
Approach 2:
Step 1: Fetching data from one database using Input/Connect in-DB Tool and stored in .avro file
Step 2: Using the same .avro file as an input and then storing the data in Hive using Output/write in DB tool
All the above approaches takes a long time to write data on Hive
While creating connection for Read and Write on ODBC we are using Hive ODBC as connection. Below connection is used for both READ and WRITE connection.
Can someone please help me on the same.
Thanks in advance
Rishabh
Hello @jainrishabh
You should use HDFS to write faster data in hive.
ex https://community.alteryx.com/t5/Engine-Works/Hadoop-Performance-Considerations/ba-p/486456
works for in-memory and in-database.
Best regards,
Simon