Driver - Hortonworks ODBC Apache Hive
Source: csv file with 10 records & 1887 records (each record length is 18 bytes)
Target: Apache Hive on Hortonworks (Connectivity through Knox gateway)
workflow: input data tool -> output data tool
Scenario1:
Trying to load csv file with 10 records from Alteryx to Hive table on Hortonworks cluster.
File size is 10 recs * 18 bytes = 180 bytes ( < 1 kb)
Transaction size was set to 1000.
Execution Status: SUCCEEDED
Scenario2:
Input file record count is increased to 1887 records
Record Length is 18 bytes. So file size is 1887 * 18 bytes = ~33 KB
Set to same transaction size 1000.
Execution Status: FAILED
Error Message: DataWrap2ODBC::SendBatch: [Hortonworks][Hardy] (34) Error from server: Bad Status: HTTP/1.1 500 Server Error. Insert into emp_alteryx(empid,empname,city) Values (?,?,?)
One of the article says this error 500 from Knox gateway was due to replay buffer size which is by default 8 kb. when we try to load 1887 records with 33kb with transaction size 1000 would be 33kb /2 = 16 kb approx. which is more than buffer size 8 kb and hence commit is failing to hold the data and henceforth error http 500. Moreover, transaction size keeping at low value would increase the data load time for huge volume of data.
Surprise for me when the workflow fails for even less volume of data (1887 recs). Any ideas / suggestions on how this buffer size can be increased or any different solution on Alteryx instead of increasing replay buffer size on hive service.
Note: I tried in-db tools as well but ended up with same error. We don't have webhdfs enabled on our cluster. So streaming data to hdfs and loading to hive is also out of option for me.
Solved! Go to Solution.
Changing the code page to utf-8 did the magic. However, the performance of data load is bad.