This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 04-20-2016 01:29 PM - edited on 07-27-2021 11:41 PM by APIUserOpsDM
One of the powerful things about the Alteryx Designer is that you can do most things related to your workflow right from within the workflow itself. One such operation is creating a Primary Key for your database table and namely using the Pre-Create SQL and Post- Create SQL options in the Input Data and Output Data tools to do that.
A Primary Key uniquely identifies a record in a database table. The value of a unique identifier, among other benefits, is that it improves database performance and allows updates on the records.
Primary Keys can be made up of one or multiple columns in the table in a database table. However, a table can only have one Primary Key. Primary Keys cannot be null and must be unique and therefore assigning a Primary Key at least consists of two steps: setting the column to Not Null and then setting it to be the Primary key. The values in the target column you’re setting as Primary Key must only have unique values otherwise the database will throw an error. If no column in the table is suitable to be a Primary key, you can use the Alteryx Record ID tool to create such a column.
This article will deal with the case of using one column as the Primary Key. Once you know how to create a one-column Primary Key, you can find many articles online explaining how to create multi-column Primary Keys in SQL.
All of the following examples assume that you know how to connect to your database.
When creating a new table in Alteryx and then saving it on the database, the easiest way is to save the table first and then “alter” the table using Post-Create SQL to set the Primary Key.
The Pre/Post-Create SQL statements are dependent on the database you’re using. In this Article you’ll find examples for SQL Server and Oracle but you can get more examples online.
For SQL Server - Expression 1a:
ALTER TABLE ExampleTest1 ALTER COLUMN PrimaryK int NOT NULL; ALTER TABLE ExampleTest1 ADD PRIMARY KEY (PrimaryK);
For Oracle (10+) – Expression 1b:
ALTER TABLE "ExampleTest1" MODIFY "PrimaryK" NUMBER NOT NULL; ALTER TABLE "ExampleTest1" ADD CONSTRAINT Example_pk PRIMARY KEY ("PrimaryK");
As you’d notice, expressions 1a and 1b are slightly different.
You can confirm that the table now has a Primary Key by using an Input Data tool and checking the Visual Query Builder. The Primary Key will show a key sign next to the field name.
If you already have a table in the database to which you want to assign a Primary Key, there are two cases:
2. The table does not have a column/columns suitable to be a Primary Key:
These are the most common cases of creating Primary Keys and you can use the same logic to create more complex Primary Keys or indeed move some of the SQL table maintenance right into your workflow using the Pre-Create/Post-Create SQL options.
Side note: if you noticed in my screenshots, I have short names for the database connections (if not, check them out). These are Aliases – a neat way to refer to you database connections. If you’re not using them you should check this article.
Here are the most common errors that might point to issues with primary keys:
Make sure a primary key is declared on the table.
You are trying to insert a key that already exists. Make sure you are not inserting duplicates. Is the Primary Key column alphanumeric but not case sensitive? For example SQL server is not case sensitive and EhzA and ehza are considered duplicate.
Note: if the key appears multiple times on the input file and an update option is chosen, the same record will be updated multiple times.
The key is set to auto-increment and Alteryx is trying to insert a value in this column. Deselect primary key column before appending to table and let the database create the value.
The In-DB tools cannot generate the SQL statement needed to update a table that has a key which is set to auto-increment. Either change the way the key is generated in your table or use the regular tools.
For further information, please contact Alteryx Support and one of us will reach out to you.
You can find the workflows used in this article attached to this post. When you open these workflows you will get errors - that's expected because your connection details are different. You'll need to update the connection details and table/column names before using them. These workflows have been created with Alteryx Designer 10.1 (10.1.7.12188).
Fadi, Henriette, Margarita
This has saved my life. Thank you for the post!
This is an excellent post. One question: Is there a way to use the In-Database tools to do the same thing for Oracle? ie. Alter table Add Constraint PK?
I'm struggling getting this to work.
In my workflow I start by pulling data from 3 tables
I need to update a table "summary" in a different database with the results of my sql; however, this "summary" does not have a primary key assigned. I want to use the litigation_id as this is the unique identifier.
I've spent the last few hours going over this. I put the statement in the PreSQL on the output tool then tried the PostSQL. Neither worked so i switched to InputTool. Neither worked. Tried various other tactics but everything ends up in an error. Example: " Output Data (6) Error running PreSQL on "..... Syntax error, expected something like a name or a Unicode delimited identifier or a 'SET' keyword or a 'CONVERT_TABLE_HEADER' k
I'm using the following:
alter table summary
alter column litigation_id int not null;
alter table summary
add primary key (litigation_id);
What am i doing wrong?
Please update this string to include a Hadoop example. Thank you!
@fadib That's a good explanation.
I have a use case where I need to create a unique identifier to sum up the price based on product(It's not available in data set).
Deep diving into use case:
we get data from multiple data sources every night and it stored in our database as one big table similar to below one(Not including Product column, included for reference). For each product we have 3 identifiers ( Supplies, Grocery, Maintenance) but each data provider will only 2 identifiers at given time.
If you look at the below example: Africa uses Primary identifier as supplies and secondary as Grocery
USA uses Primary Identifier as Grocery and Secondary as Supplies.
England uses Primary identifier as Maintenance and Secondary as Grocery.
Technically all this comes under one product so I want find the sum price for product Milkyway . Is there a way I can create a unique identifier which I can use as Primary Key to sum up the values.
Note : At any given time we will have 2 identifiers populated. Any idea how we can resolve this using alteryx ?
Thanks in advance.
Product | Data source | Primary Identifier | Identifier Type | Secondary Identifier | Secondary Identifier Type | Price | Unique Identifier |
MilkyWay | Africa | XYZ123 | supplies | B123456789 | Grocery | 44 | XXXXXXXXX |
MilkyWay | England | ABC12345 | maintenance | B123456789 | Grocery | 12 | XXXXXXXXX |
MilkyWay | India | B123456789 | Grocery | ABC12345 | maintenance | 67 | XXXXXXXXX |
MilkyWay | USA | B123456789 | Grocery | XYZ123 | supplies | 14 | XXXXXXXXX |
MilkyWay | Spain | B123456789 | Grocery | XYZ123 | supplies | 15 | XXXXXXXXX |
Pepsi | UAE | ZZZ123 | Supplies | C123456789 | Grocery | 55 | XXXXXXXXX |
Pepsi | England | C123456789 | Grocery | ZZZ123 | Supplies | 11 | XXXXXXXXX |
Pepsi | India | C123456789 | Grocery | ZZZ123 | Supplies | 22 | XXXXXXXXX |
Pepsi | USA | C123456789 | Grocery | ZZZ123 | Supplies | 33 | XXXXXXXXX |
Pepsi | Spain | C123456789 | Grocery | ZZZ123 | Supplies | 62 | XXXXXXXXX |
Expected Output
Data source | Primary Identifier | Identifier Type | Secondary Identifier | Secondary Identifier Type | Price | Unique Identifier |
Africa | XYZ123 | supplies | B123456789 | Grocery | 44 | RANDOM1234 |
England | ABC12345 | maintenance | B123456789 | Grocery | 12 | RANDOM1234 |
India | B123456789 | Grocery | ABC12345 | maintenance | 67 | RANDOM1234 |
USA | B123456789 | Grocery | XYZ123 | supplies | 14 | RANDOM1234 |
Spain | B123456789 | Grocery | XYZ123 | supplies | 15 | RANDOM1234 |
UAE | ZZZ123 | Supplies | C123456789 | Grocery | 55 | RANDOM345 |
England | C123456789 | Grocery | ZZZ123 | Supplies | 11 | RANDOM346 |
India | C123456789 | Grocery | ZZZ123 | Supplies | 22 | RANDOM347 |
USA | C123456789 | Grocery | ZZZ123 | Supplies | 33 | RANDOM348 |
Spain | C123456789 | Grocery | ZZZ123 | Supplies | 62 | RANDOM349 |
This article was extremely helpful! I do have a question regarding an issue that has come up for me. I had to create my unique column by combining a couple of other columns. This made it necessary to format the column as a v_string value. Now when I try to run the Post-SQL statement, I keep getting errors. I've tried using "char" & "varchar" in the statement, but keep getting the following error:
Error: Output Data (2): Executing PostSQL: "ALTER TABLE "Coverage"
ALTER COLUMN "Key" varchar NOT NULL" : [Microsoft][SQL Server Native Client 11.0][SQL Server]String or binary data would be truncated.[Microsoft][SQL Server Native Client 11.0][SQL Server]The statement has been terminated.
Here is the exact statement I am using. The Key column is a maximum of 11 characters.
ALTER TABLE "Coverage"
ALTER COLUMN "Key" varchar NOT NULL;
ALTER TABLE "Coverage"
ADD PRIMARY KEY "Key"
Any advice?
@mdesmith I think that error is because you are trying to put a value in that column that is more than 11 characters. Run a filter at the end of your workflow that looks at the length of the values.
Another thing that I would suggest is creating a composite key on the table, that is where your primary key consists of a few fields. I used to do the exact method you described but I changed it once I discovered this: https://www.1keydata.com/blog/composite-key-in-sql.html
Essentially, if you are writing your script to create the table, the end will
have something like this: Create Primary Key (Column1,Column2,Column3)
Currently looks like this: Create Primary Key (ConcatenatedColumn)
Clear as mud right?
The SQL error "String or binary data would be truncated" indicates the value you are inserting in that field is too long.
To troubleshoot this, I'd put a Select tool right before the Output tool and explicitly set the "Key" field to a length of 11. My suspicion would be that the process would now run without the SQL error, but you will then get an Alteryx warning on the Select tool that certain records have been truncated within Alteryx. That should identify where the issue is from a data standpoint.
Rod
This is a great article. Is it possible to add additional guidance when writing to Snowflake?
@mcbridewilliam it should work the same way in Snowflake. As long as your tables are created with keys. There are also some things around casing in Snowflake that you need to be aware of. Make sure you are matching your casing.
This is a great thread. However, my primary key is a UPC code, which is 12 characters. I keep getting the "String or binary data would be truncated in table" error when trying to add this as a primary key in SQL. Isn't there a way to stop truncating the field? I need all 12 characters as a primary key...not 11.
Nevermind. I figured it out (add varchar (12) to the sql statement). I'm new to writing Alteryx up to SQL, so it's a daily learning process.
Thanks for the detailed article!
Hello @fadib
Yes, it works with in-memory but not in database, which is frankly a shame...