Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

[DB-Connection] Speed of writing data to Hive DB extremely slow.

akpew
5 - Atom

Hi all experts,

 

Current i face problem when i try to use output file or IN-DB, it is very slow. 30MB size csv import to hive database it tooks 2 and half hours to finish. Why is it happen? what factor decide the speed of output data from alteryx to hive database? HDFS? Internal network issues? ODBC config? 

More details: 

Im using lastest Simba ODBC to connect and get the data from UAT mysql database server, input data by input tools from mysql and  local to browser data, it is fast and takes no more than 10 secound. In-db tools i tried also, read is fast and wirte is super slow. (no different with output tools). With this super slow speed of write data,totally unacceptable speed on real life.

At the beginning i think is hdfs or hive issues so I tried to move file to HDFS and copyFromLocal write to hive db used only around 5 secound to finish mapreduce. 

 

Please help and i can provide more information if you need, thank you!

 

More  inforamtion to let you guys:

Testing UAT envHardwareFirewall
Window server 2019 (Alteryx location)8 cores and 32 GB ramClose
HDFS - name node8 cores and 32 GB ramClose
HDFS - data node *2 4 cores and 16 GB ram per nodeClose
   

FYI,HDFS cluter and hive is using Apache Ambari open source management platform

 

This picture is how Alteryx get data on testing environment.

 

UAT SERVER A_page-0001.jpg

Alteryx Output data setting:

PC-Setting.png

 

Window Server ODBC config:

PC-ODBC1.png

PC-ODBC2.png

 
2 REPLIES 2
apathetichell
19 - Altair

There are a whole slew of reasons why this could be writing slow (including internal one size fits all sql generation from Alteryx)... two things I'd try - 1) add a sample tool prior to writing with a limit of 10X the files in your .csv - this tells Alteryx and HIVE to expect a certain number of records instead of allowing an infinite write statement. 2) Try Datastream In as your write tool. I only use HIVE in conjunction with Spark/Databricks - so the writing there is through the Spark driver but I'm not seeing these issues.

akpew
5 - Atom

Hi  apathetichell,

Thank you for your reply,

  1. i have test add the sample tools its slower than before 
  2. I tried Datastream, and yes it's a little bit faster then output data

I made some changing after read alteryx ODBC document 

Simba ODBC -> selected

  • Fast SQLPrepare

Manage in-DB connections 

write section 

  • driver hive ODBC 
  • option -> transaction size, i set 1000000

After that 150 MB data by using datastream and changed config. it taks 32 mins to finish. I see that's big improvement,  but still want to know how to insert into hive more faster.  The speed still not good enough, cant even finish the test case in UAT environment.  And i want to know is it becaues the HDFS make hive insert from alteryx so slow? or config not setting well?

i tried read data from hive, 30 second read 150MB data from hive.  That's so wired. 

Looking forward for your reply and hope other guys with same experience before can give the suggestion too thanks. 

 

Labels
Top Solution Authors