[DB-Connection] Speed of writing data to Hive DB extremely slow.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi all experts,
Current i face problem when i try to use output file or IN-DB, it is very slow. 30MB size csv import to hive database it tooks 2 and half hours to finish. Why is it happen? what factor decide the speed of output data from alteryx to hive database? HDFS? Internal network issues? ODBC config? More details: Im using lastest Simba ODBC to connect and get the data from UAT mysql database server, input data by input tools from mysql and local to browser data, it is fast and takes no more than 10 secound. In-db tools i tried also, read is fast and wirte is super slow. (no different with output tools). With this super slow speed of write data,totally unacceptable speed on real life. At the beginning i think is hdfs or hive issues so I tried to move file to HDFS and copyFromLocal write to hive db used only around 5 secound to finish mapreduce.
Please help and i can provide more information if you need, thank you!
More inforamtion to let you guys:
FYI,HDFS cluter and hive is using Apache Ambari open source management platform
This picture is how Alteryx get data on testing environment.
Alteryx Output data setting:
Window Server ODBC config: |
- Labels:
- Database Connection
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
There are a whole slew of reasons why this could be writing slow (including internal one size fits all sql generation from Alteryx)... two things I'd try - 1) add a sample tool prior to writing with a limit of 10X the files in your .csv - this tells Alteryx and HIVE to expect a certain number of records instead of allowing an infinite write statement. 2) Try Datastream In as your write tool. I only use HIVE in conjunction with Spark/Databricks - so the writing there is through the Spark driver but I'm not seeing these issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi apathetichell,
Thank you for your reply,
- i have test add the sample tools its slower than before
- I tried Datastream, and yes it's a little bit faster then output data
I made some changing after read alteryx ODBC document
Simba ODBC -> selected
- Fast SQLPrepare
Manage in-DB connections
write section
- driver hive ODBC
- option -> transaction size, i set 1000000
After that 150 MB data by using datastream and changed config. it taks 32 mins to finish. I see that's big improvement, but still want to know how to insert into hive more faster. The speed still not good enough, cant even finish the test case in UAT environment. And i want to know is it becaues the HDFS make hive insert from alteryx so slow? or config not setting well?
i tried read data from hive, 30 second read 150MB data from hive. That's so wired.
Looking forward for your reply and hope other guys with same experience before can give the suggestion too thanks.
