I have an issue with loading data to Cloudera.
I have tested both impala and Hive using the Simba odbc connectors and have set the write access on both to use AVRO.
It turns out these don't work very well when you are forced to use kerberos as they write row by row. To give you an idea of timings.
To load a file of 500,000 records containing 42 fields to impala took 6 hours.
I then attempt to upload the file via HDFS using the hadoop connector, this works great as the same file gets uploaded in 12 seconds.
Problem I have now is that although I can see the file in hdfs, I cannot view the file in either Impala or Hive. I understand why this is as I have not supplied a metadata file.
my question is what is the best way of doing this via alteryx.
Ideally I want to upload a different avro file each day that is used to then overwrite an existing table.