Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Database upload suddenly taking hours instead of minutes, any ideas why?

Philip
12 - Quasar

I'm attempting to drop and replace a table on our SQL Server database. I've done this before, but today it's taking hours instead of its normal 5-10 minutes. The data set is about 2GB in size. I'm receiving these errors, which I've never seen before. The only difference is I've changed a field data type from V_WString to a 64 character String.

 

Output Data (2)    File Format does not match translated type.
Output Data (2)    Alias translated to odbc:DRIVER={SQL Server Native Client 11.0};DATABASE=TheDB;SERVER=OurServer;Trusted_Connection=yes|||TableName
Output Data (2)    ODBC Driver version: 03.80

 

Any ideas why I'm receiving these errors and/or why it's suddenly taking so much longer to upload? Are they related?

11 REPLIES 11
Joe_Mako
12 - Quasar

V_WString and WString are UTF-16 Unicode fields.
V_String and String are ISO 8859-1 Latin-1 fields.

 

Generally it is a good idea to match the data type of the table you are uploading to.

 

What prompted you to make the change from V_WString to String? If you just need to ensure character length is 64, then you might try WString instead.

Philip
12 - Quasar

When querying the database, I received the following error.

 

Select DB.[Program for reporting],

DB.[MSG_ID /opu_r25_id]
From DB WHERE
DB.[Program for reporting] = 'SIV Surveillance'

 

Msg 402, Level 16, State 1, Line 1
The data types text and varchar are incompatible in the equal to operator.

 

I was trying to see if changing the [Program for reporting] data type to String would correct the problem. I have to admit, I'm just learning SQL Server ins and outs, so this may not be the issue.

Joe_Mako
12 - Quasar

Edit: I looked up SQL Server data types and I found: https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql

 

text data type in SQL Server is depreciated, and should not be used.

 

Here is what I would expect:

SQL Server Data Type = Alteryx Data Type

varchar = V_String

char = String

nvarchar = V_WString

nchar = WString

Philip
12 - Quasar

I converted it to V_String and the query works now. I'm wondering if SQL Server sees a string in single quotes as a non-unicode string and the unicode was causing the error.

Philip
12 - Quasar

Thanks for your help @Joe_Mako. Not sure what's happening under the hood with SQL Server, but as long as it works, I'm good!

HenrietteH
Alteryx
Alteryx

@Philip 

 

Not all data types in Alteryx line up exactly with data types used in databases, so we do our best approximation to match them. 

For string data types, the way @Joe_Mako outlines them above is how they are matched. 

 

Generally speaking, when you are loading data to a database, you want to make sure that you do your best to match the datatypes to what you find in the database to avoid errors on load. 

For example, if your column in the database is only 2 characters long, you don't want to load a 255 character string field to it. 

If your data contains unicode characters, use a wide string. 

I would avoid fixed length strings in most cases as they are padded with blank spaces to match the length of the string. 

Henriette Haigh
Technical Product Manager - Data Connectors
Alteryx, Inc.

SeanAdams
17 - Castor
17 - Castor

Hey @Philip

 

Depending on which version of SQL you are using, there are a few things to note.

  • Firstly - to figure out what version of SQL you're on - you can open up a query window to your SQL database and type Select @@Version.   It'll give you a result like the screenshot below.    The reason the SQL version is important is that the data-types have changed subtly as SQL has matured (for example the Text and Memo columns have gone; and SQL has broadly followed the evolution of the ISO SQL standards as they have grown and matured.   :-) I certified on SQL Server 9.5 when the ISO SQL was SQL-92, and I'm presently studying to re-certify on the newer generation of SQL

2017-06-26_10-56-45.png

  • Second: Unicode strings in MS-SQL (also called T-SQL) are prefixed by n.  For example nVarchar is the equivalent of V_WString in Alteryx
  • Third: Text as a type has been killed - you can now use far larger VARCHAR columns (they used to be limited to 5000 characters at one stage, but now you can use several thousands, or even use Varchar(Max))
  • Fourth: There are some great tools that come out the box with SQL - in fact the fastest way to get data into SQL is to use the import tools.   If you have the Enterprise manager tools installed, you can just search your start menu for "Import" and it should find them.  
    • The reason for this is that Alteryx inserts every row (or batch of rows if you've turned on transaction size) through the SQL Execution engine, which in turn triggers a call to the indexer to re-index these rows.  So every insert you do is then followed by a collection of reindex operations; and any triggers on your table are fired. 
    • In comparison, the SSIS (SQL SErver Integration Services - previously known as Data Transformation Services / DTS), can do a bulk copy operation - where they copy directly into the data pages, bypassing the query engine, the indecies, triggers etc - and pump it straight into data pages.
    • Comparison is HUGE.   For a 350M row table we had to re-copy from DB2 - doing this through alteryx took 6 or 7 days, doing this with SQL importer took 5 hrs
  • Fifth: Make friends with Index Optimizer and Query Profiler
    • One of the best things about MS SQL is that they have some pretty awesome tools built in, but you have to be a DBO (database Owner) or SA (sys admin) to use them.   They are essentially Database Administrators in a box.
    • The profiler listens to the DB, and records EXACTLY what was asked of it over time.  You can store this in a table or just observe it.   this is TREMENDOUSLY helpful because it's not easy to get under the hood to understand the EXACT query that eitehr Alteryx or TAbleau is running, so you don't know how to optimize your indices or partitions etc.
    • The Index Optimizer can either take a single query that you're struggling and give you recommendations to tune your DB; or it can take the output from the profiler (which you can leave running for a day), to give you recommendations to tune your DB for an entire day or week's actual load.   This is fantastic, because now you can tune your DB for exactly the load that you're placing on it, and don't need to spend time trying to guess, or reverse-engineer your tableau or alteryx canvasses.
  • Sixth: IF you're on SQL 2016 - make the switch to column-store data.   This is a huge topic, but we generally get one or 2 orders of magnitude improvement in analytical query performance by using a clustered column-store rather than row-store; and the tables are about 10% of the size due to the fact that Column-store is naturally compressed.   There's loads here to learn, but the summary is that column-store is the most efficient way of storing analytical data and serving analytical query loads, and it's available from 2012 (non-clustered) or 2016 (clustered - i.e. changing the physical storage of the data on the disk)

 

I have my own SQL 2016 server installed at home (MSDN subscriber) and am more than happy to help with SQL questions - and your other alternative if you're curious is that you can create an account on Azure, where you can set up an Azure SQL Server where you only pay for the processor cycles you use - which is a great way to learn the platform.   All the basic SQL tools work on that platform, so you can connect to it just like as if you were on your own home machine.

 

Good luck @Philip - let me know if I can help

Sean

Philip
12 - Quasar

Thanks, @SeanAdams

 

I really appreciate the detailed information.

 

About #4, these data update daily with an Alteryx process. Currently it takes less than 10 minutes to insert. I've been looking for a way to do only additional records, but the SQL Server system I retrieve data from is designed to ingest data, some of which may be updates or duplicates, so I've been performing a Delete Data & Append process. Due to its daily update, I'd rather have the upload as part of the process.

 

I'm using the SQL Server Bulk Loader and that has been a huge improvement in speed. Would transaction size adjustments have much impact on upload speed?

SeanAdams
17 - Castor
17 - Castor

Hey Philip,

 

The transaction size impacts how broadly the database will generate locks, so if your database is under a constant heavy read load, then it will take time to get a confirmed lock.    This is heavily dependant on how your data is structured, and the data read load.

 

For example:

- If I'm inserting 20 000 rows, and I'm always inserting using an increasing primary key; and the primary key is also my clustered index (clustered = physical storage order on the disk), then I'm probably only going to be updating the last page of the table.   Because we have a clustered index on the increasing primary key - I'm always inserting on the last page so it probably won't take me long to get a dedicated write-lock on this page (assuming very few people are reading the last page).

- However if your clustered index is on something more random (like a date field, or a product ID in a sales table), then 20 000 rows may be hitting hundreds of different pages in your table.  Because it's writing, it needs to wait to get a write-lock on those pages, which requires any in-process reads to complete.   And it will hold that write lock until it's finished.   So unless all your readers are using uncommitted reads (dirty reads), then a big update across multiple pages can easily lock many folk out of the table for a long while.

 

hope this makes sense?

 

Out of interest - have you found a way to invoke the SQL bulk loader from within an Alteryx process, or have you just scheduled it using the SQL agent or something similar?   Reason for asking is that I'd dearly love to do a native bulk load through Alteryx, but haven't found out how to do it yet.

 

Cheers Philip - have a good Monday

Sean

Labels