Issue with uploading using HDFS

I have an issue with loading data to Cloudera.

I have tested both impala and Hive using the Simba odbc connectors and have set the write access on both to use AVRO.

It turns out these don't work very well when you are forced to use kerberos as they write row by row. To give you an idea of timings.

To load a file of 500,000 records containing 42 fields to impala took 6 hours.

I then attempt to upload the file via HDFS using the hadoop connector, this works great as the same file gets uploaded in 12 seconds.

Problem I have now is that although I can see the file in hdfs, I cannot view the file in either Impala or Hive. I understand why this is as I have not supplied a metadata file.

my question is what is the best way of doing this via alteryx.

Ideally I want to upload a different avro file each day that is used to then overwrite an existing table.

Workflow

Best Practices

In Database

Database Connection

In-Database

Accepted answers

DiganP

@stevenwaddon Yes, writing to hive is slow. It speeds up if you make the default varchar size in the driver smaller. Have you looked at this article as well - How to Write to Hive Faster? We recommend using the stream in tool, which creates a table reference also.

All comments

DiganP

stevenwaddon

thank you DiganP

Interestingly, I was already doing everything that the article addressed. what resolved the issue for me was to change the default varchar size in the driver.

The result is that Hive imported the same 500k file in 55 seconds.

Also of interest was that I also followed that same approach with the impala driver and the results were even better. the same file uploaded in 20 seconds.

Thank you very much for this.

PabloQ

Hi there!

Could you give me a hadn and tell me how to do this?:

@DiganPwrote:
@stevenwaddon make the default varchar size in the driver smaller.

Thanks in advance!

JanPospisil

I had the same issue and so wanted to confirm that reducing var char size solved issue.

However I consider this as a bug and it should be looked into by alteryx team.

Quick Links

This months top contributors

atcodedog05 19598

Qiu 15867

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703