Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Insert into Oracle Database hosted by AWS

Winston
7 - Meteor

I'm working on a project where I need to write approx. 190k records to an Oracle DB hosted at AWS on a daily basis.   I don't have any issues querying or writing to the database from the standpoint of reading or writing data from / to the DB. The issue I'm having is speed.  It is taking 10 hours to upload 190k records.  I know a remote hosted DB will be slow, but I wasn't expecting this slow.  Wondering it anyone had any suggestions for speeding up the process, either with how I'm using Alteryx or how the database is configured.

For our initial data load my workflow is simple:

Winston_1-1682661060482.png

 

The database output tool configuration is below.  This is using the standard Oracle OCI driver.

 

Winston_0-1682661033517.png

 

I set the transaction size to a low number to see if it made a difference from the extremely high default value.  It didn't.


Is there a better Alteryx tool to use or is there a better driver maybe?

 

Thanks for any advice you can provide!

 

-Winston

5 REPLIES 5
Yoshiro_Fujimori
15 - Aurora

Hi @Winston 

If you want to reduce the data size from your database, have you considered using In-DB tools?

https://help.alteryx.com/20223/designer/database

 

 

Yoshiro_Fujimori
15 - Aurora

If you use AWS, you may find this blog post interesting.

The Ins and Outs of In-DB: Do Something Awe-Inspiring with AWS

 

 

apathetichell
18 - Pollux

So a few issues:

1) Do you have enough ram? Do you have enough ram allocated to Alteryx?

2) Where are you getting this data from? Is this from a different DB? What's the length of time it takes to get your query and return it in memory? Is it faster via In-DB - note to go from DB1 to DB2 (two different DBs) you would need to go in-memory (datastream out/in) - but if they are both on Oracle in AWS - you would not need to.

3) Are there other latency issues (other users) affecting either db?

4) What is the size of your Oracle DB footprint (is it on an EC2? How large is your EC2?)

5) Can you share your drive configurations (odbc 64) and your ouput tool settings/manage in-db configurations? There could be something wrong on the driver side...

 

Also worth asking - what's up with the Spatial Object field [none]? how large are these objects? try the bulk loader?

 

Also Transaction Size should be 0. 5 would mean you are writing 36,000 temporary log files on a 180k upload...

Winston
7 - Meteor

@Yoshiro_Fujimori@apathetichell

 

Thanks for the suggestions. Status update is below.

 

Update: Using a bulk data loader I have the time cut down to 1.5 hours, which is still unacceptable.  I have also tried In-DB and if I read in all the records from the table and immediately write it to another table it takes 22 seconds.  But if I add an In-DB Browse tool (configured to show all daa) it takes 1.25 hours.  

 

Obviously In-DB is the way to go, without adding browse tools to monitor data flow, but I'm running into the issue that In-DB tools don't have all the tools I'm using, ie. Append, Fuzzy Match, etc.  Since its the acquiring of the data that is taking the bulk of the time, Using the IN-DB Data Stream Out tool and doing the appends and Fuzzy matching and then use In-DB Data Stream In so it can be written to the database seems like I'm doubling the run time of the workflow.

 

Any suggestions on how get around streaming out and then streaming back in, just to use append or fuzzy matching? 

 

Thanks

apathetichell
18 - Pollux

 Something is configured wrong. your browse In-DB tool should not be adding that much of a load to your process. Are you logging everything by accident? Can you share your odbc 64 config? What driver are you using?

 

You are correct that in order to use Alteryx specific tools (ie fuzzy match) you will need to stream the data out... For append there are work arounds... I think I mentioned that this is really ram specific... what kind of machine are your running (virtual/real?) what's the memory allocation? When you query how much of a hit do you see in resource manager? Are you on a VPN?

 

My first hunch because of the browse issue (unless you drastically increased the default size of 100 records) is that you have something configured off in your ODBC.

Labels