Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Importing too many records respectively in a database

Saraabdi955
8 - Asteroid

Hi,
I have read a large number of files (with infringing formats and a number of different records) from one directory and I want to enter the obtained data (including more than 40 million records) into the database through the ODBC driver, but after a while (about 8 hours) ) Processing speed and entering in the database is very slow.
Can you help me how to enter every 5 million records, for example, in a table in the database?

7 REPLIES 7
Qiu
20 - Arcturus
20 - Arcturus

@Saraabdi955 

Maybe we can use a batch macro to commit the date every 5million?

danilang
19 - Altair
19 - Altair

Hi @Saraabdi955 

 

We had this problem with one of our processes.  Data updates were taking upwards of 10 hours. On investigating why it was so slow when using the Insert/append methods available with standard OLE/ODBC, we found that Alteryx performs primary key checks while inserting the records.  When we looked at what was happening on the database session, we found that Alteryx generates a stored proc based on the primary key of the table and calls this to insert the records performing updates/inserts depending on the values passed.  This method is used even if you delete all the records first.   This causes the process to slow down as more and more records are added to the table since the stored proc has to wait on the table indexing to complete before adding each new record 

 

To get around this, you can use the bulk import features available with SQL Server and Oracle.  If you're using one of these, you can set up an output tool to use bulk import 

 

In the Data Sources section pick Bulk option that corresponds to your database

 

danilang_0-1612702965585.png

 

Choose the data source and the the table.  You'll end up with a config that looks like this

 

danilang_1-1612703176542.png

 

With this method, you loose the Update: Warn/Error on Update and Update:Insert If New option(green box), but you get the option to set your transaction sizes(blue)

 

danilang_2-1612703810820.png

 

A couple things to note using this method

1. You have to manage the Update/Insert if New option within your workflow.  Since Alteryx  isn't calling the stored proc in the background, you'll have to determine how to handle updates to existing records  If you don't have many updates you can use a standard ODBC connection to update these and a bulk connection to append the new ones.  If you have many updates it may be faster to use the bulk connection with Delete Data and Append to push all the records to the table 

 

2. You'll need "Alter" permission on the table(in SQL Server, anyway) if you're using the Delete Data & Append option.  To get the maximum speed, this method performs a Truncate on the table which isn't allowed if you only have write access to the data.

 

In our case we found that the most efficient route was to

A. Read all the existing records from the table

B. Handle updates and inserts in the workflow to generate a complete dataset

C. Use Delete Data and Append and push the entire data set back to the table

 

By going this route we cut down the execution time of our workflow from 10 hours+ to 15 minutes, including 3 minutes to read all the existing data.

 

If you know that your dataset doesn't modify existing records, you can skip the read and just push the new records using the "Append" option.

 

Dan   

 

Saraabdi955
8 - Asteroid

Thank you so much Dan, @danilang 

If it's possible please tell me more about these steps:

"""A. Read all the existing records from the table

B. Handle updates and inserts in the workflow to generate a complete dataset

C. Use Delete Data and Append and push the entire data set back to the table"""

 

 

Actually I want to know your suggestion is selecting bulk connection instead of ODBC?

With the same setting?

I need it so much and immediately.
Regards,

 

danilang
19 - Altair
19 - Altair

Hi @Saraabdi955 

 

A. Use an Input Data tool to read all the records from the existing table.  

B. The process of figuring out which records are updates and which are appends works like this 

1. join the existing data(R Input) to the new data(L Input) on primary key

2. With the records on the J output, compare the non-key values from the existing to the non-key values in the new.  Update any changed values in the existing fields with the values from the new fields.  Then use a Select to remove the new fields

3. Use a Union to add the records from the corrected J records to the records from the R and L outputs

C.  Use an Output tool configured to use a SQL bulk connection and Set the output options to Delete Data and Append

 

If you're a DBA then remember to grant Alter permission on the table in SQL server.   If you don't have the access required to grant Alter, then work with your DBA team to get it.

 

 

Dan

 

Saraabdi955
8 - Asteroid

Thank you @danilang 

I do this but the processing speed is still slow ...

danilang
19 - Altair
19 - Altair

Hi @Saraabdi955 

 

Which part of the processing is still slow?  Reading 40M from excel can take quite a long time.  

 

If it's the output, can you post the configuration of your output tool?

 

Dan

Saraabdi955
8 - Asteroid

Hi @danilang 

No reading the files are fast, storing the records in database (Output module) is slow yet.

seting.png is configured setting and workflow.png is main workflow that is running now ...

in 70 minutes only 34% was imported in database

Labels