Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Issue with uploading using HDFS

stevenwaddon
6 - Meteoroid

I have an issue with loading data to Cloudera.

 

I have tested both impala and Hive using the Simba odbc connectors and have set the write access on both to use AVRO.

It turns out these don't work very well when you are forced to use kerberos as they write row by row. To give you an idea of timings.

To load a file of 500,000 records containing 42 fields to impala took 6 hours.

I then attempt to upload the file via HDFS using the hadoop connector, this works great as the same file gets uploaded in 12 seconds.

 

Problem I have now is that although I can see the file in hdfs, I cannot view the file in either Impala or Hive. I understand why this is as I have not supplied a metadata file.

 

my question is what is the best way of doing this via alteryx.

 

Ideally I want to upload a different avro file each day that is used to then overwrite an existing table. 

 

6 REPLIES 6
DiganP
Alteryx Alumni (Retired)

@stevenwaddon Yes, writing to hive is slow. It speeds up if you make the default varchar size in the driver smaller. Have you looked at this article as well - How to Write to Hive Faster? We recommend using the stream in tool, which creates a table reference also. 

Digan
Alteryx
stevenwaddon
6 - Meteoroid

thank you

 

Interestingly, I was already doing everything that the article addressed. what resolved the issue for me was to change the default varchar size in the driver.

 

The result is that Hive imported the same 500k file in 55 seconds. 

 

Also of interest was that I also followed that same approach with the impala driver and the results were even better. the same file uploaded in 20 seconds.

 

Thank you very much for this.

 

PabloQ
5 - Atom

Hi there!

 

Could you give me a hadn and tell me how to do this?:

 


@DiganPwrote:

@stevenwaddon  make the default varchar size in the driver smaller.


Thanks in advance!

stevenwaddon
6 - Meteoroid

You need to go  to your odbc  driver configuration and amend as highlighted above. this may not be the right length for you, but by default mine was set to the max

System DSN.PNG

PabloQ
5 - Atom

Hi,

 

 

Do you mean you used Impala Driver to WRITE into HIve/Impala AVRO table?

 

Could you share an screenshot of the configuration?

JanPospisil
5 - Atom

I had the same issue and so wanted to confirm that reducing var char size solved issue.

 

However I consider this as a bug and it should be looked into by alteryx team.

Labels