Alteryx Designer Desktop Discussions

akpew · ‎05-11-2022

Hi all experts,

Current i face problem when i try to use output file or IN-DB, it is very slow. 30MB size csv import to hive database it tooks 2 and half hours to finish. Why is it happen? what factor decide the speed of output data from alteryx to hive database? HDFS? Internal network issues? ODBC config?

More details:

Im using lastest Simba ODBC to connect and get the data from UAT mysql database server, input data by input tools from mysql and local to browser data, it is fast and takes no more than 10 secound. In-db tools i tried also, read is fast and wirte is super slow. (no different with output tools). With this super slow speed of write data,totally unacceptable speed on real life.

At the beginning i think is hdfs or hive issues so I tried to move file to HDFS and copyFromLocal write to hive db used only around 5 secound to finish mapreduce.

Please help and i can provide more information if you need, thank you!

More inforamtion to let you guys:

Testing UAT env	Hardware	Firewall
Window server 2019 (Alteryx location)	8 cores and 32 GB ram	Close
HDFS - name node	8 cores and 32 GB ram	Close
HDFS - data node *2	4 cores and 16 GB ram per node	Close

FYI,HDFS cluter and hive is using Apache Ambari open source management platform

This picture is how Alteryx get data on testing environment.

UAT SERVER A_page-0001.jpg

Alteryx Output data setting:

Window Server ODBC config:

apathetichell · ‎05-12-2022

There are a whole slew of reasons why this could be writing slow (including internal one size fits all sql generation from Alteryx)... two things I'd try - 1) add a sample tool prior to writing with a limit of 10X the files in your .csv - this tells Alteryx and HIVE to expect a certain number of records instead of allowing an infinite write statement. 2) Try Datastream In as your write tool. I only use HIVE in conjunction with Spark/Databricks - so the writing there is through the Spark driver but I'm not seeing these issues.

akpew · ‎05-18-2022

Hi apathetichell,

Thank you for your reply,

i have test add the sample tools its slower than before
I tried Datastream, and yes it's a little bit faster then output data

I made some changing after read alteryx ODBC document

Simba ODBC -> selected

Fast SQLPrepare

Manage in-DB connections

write section

driver hive ODBC
option -> transaction size, i set 1000000

After that 150 MB data by using datastream and changed config. it taks 32 mins to finish. I see that's big improvement, but still want to know how to insert into hive more faster. The speed still not good enough, cant even finish the test case in UAT environment. And i want to know is it becaues the HDFS make hive insert from alteryx so slow? or config not setting well?

i tried read data from hive, 30 second read 150MB data from hive. That's so wired.

Looking forward for your reply and hope other guys with same experience before can give the suggestion too thanks.

Alteryx Designer Desktop Discussions

[DB-Connection] Speed of writing data to Hive DB extremely slow.

Re: Outlook 365 Input tool issue

Re: parsing text to date

Re: If formula to determined a value based on date...

Re: Running multiple alteryx workflows within alte...

Re: Selecting the columns coming after a specific ...