Now that it's live, don't forget to accept your certification badge on Credly today! Learn more here.
We are currently experiencing an issue with Email verification at this time and working towards a solution. Should you encounter this issue, please click on the "Send Verification Button" a second time and the request should go through. If the issue still persists for you, please email support@alteryx.com for assistance.

Alteryx Designer Knowledge Base

Definitive answers from Designer experts.

Error: "Primary Key required for Update option" when using Update option for Output on Hive

gtorres8
Alteryx
Alteryx
Created

Issue


When using the "Update: Insert if New" or "Update: Error on Update Failure" output options for the Output Data tool when connecting to Hive ODBC, the following error occurs:
 

Primary Key required for Update option


 

Environment Details


 

  • Alteryx Designer
    •  2020.4+
  • Hive
    •  3.1.3
  • ODBC Hive
    • Cloudera 2.6.9+
    • Simba 2.6.8+


Diagnosis


Hive Primary Key capabilities are only available on 3.x of Hive. No primary key constraint exists on the table using the Update option in the Output Data tool. Only Primary keys will work as a constraint, Unique keys constraints will not qualify. Or due to an underlying bug in Hive, the schema name is not referenced in the SQL query.
 

Cause A


No Primary key constraint exists on the table.



Cause B


An underlying Hive bug is interfering.



Solution


A Hive table must have a Primary Key constraint for the Update option to function.

Alternatively, it's possible to add DDL scripts to create a new table or alter an existing table with the Primary Key constrain as follows.

 

NOTE

 

These DDL scripts will require an elevated or DBA permission role or request your DBA for assistance to run these commands. And these commands are meant to be run only once and can be removed from the PreSQL Query section.

 

Create a new table with a Primary key constraint built-in. Example DDL Script:
 

CREATE TABLE MyTBL_W_PK 
       (id int,
        first string,
        last string,
        primary key(id) 
        disable novalidate);

 

Or use the Alter table DDL script within the PreSQL Query section of the Output Data tool for an existing table. Example DDL Script:
 

ALTER TABLE MyTBL_W_PK ADD CONSTRAINT pk_id PRIMARY KEY (id) DISABLE NOVALIDATE;


 

Workaround


If the table in question does have a primary key in the table and the error still occurs, this can occur due to an existing defect (HIVE-23935) in Hive. The workaround is to ensure to explicitly specify the schema name in the SQL query.

Also, follow the requirements for Hive transaction/update support to successfully use the UPDATE feature. See the Configuration on this site.

Example process:
 
CREATE TABLE mytable (id INT, name STRING, PRIMARY KEY(id) DISABLE NOVALIDATE) CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES('transactional'='true');

set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

INSERT INTO TABLE mytable VALUES (1, 'joe');

UPDATE mytable SET name = ‘fred’ WHERE id = 1;


Additional Resources

 
No ratings