Hi all!
I'm writing data from excel or alteryx database to Hive, some months ago was taking 6-10 hrs to write 3 millions of data, but now it's taking to much time and in 23 hours just write the 15%, my disk memory is available and I have optimized my workflow because I just need the data in Hive, also I need to use a VPN and Kerberos to do it, it's the work computer.
Could you guide me what am I doing wrong or what can I do to improve the time?
Workflow:
Hive ODBC Driver Advanced Options:
Manage In-DB Connections:
Thanks in advance!!!
Solved! Go to Solution.
Hello @JRamos
Usually i have two in db connections for Hive :
1/ with odbc for classic use (when data comes from in-db)
2/ the other one writing on HDFS instead of ODBC (when data comes from "in-memory" like an excel import. this is the solution you must use here. Time can be reduce by 5,10,20...
Best regards,
Simon
Hi @simonaubert_bd,
Writing on HDFS like the next one? It is correct to use the same connection string as the read ODBC ONE?
@JRamos About the hdfs :
-no, it's not the same string at all. this is more like
Please contact your datalake admin for the exact path/configuration.
However, Alteryx must propose a windows like that when you choose HDFS Parquet and click on the black arrow
-use parquet, not avro
-use two indb alias like HIVE_PROD_ODBC and HIVE_PROD_HDFS since Alteryx does not distinguish creating a table and inserting data from in-memory. cf idea https://community.alteryx.com/t5/Alteryx-Designer-Ideas/In-DB-Connexion-windows-should-be-divided-in...
Thank you @simonaubert_bd , I have changed the write driver with the exact path related to HDFS and now it's working!