Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Workflow Performance Issue

Akash2093
7 - Meteor

Hi

 

I have developed a workflow comprising more than 70 tools wherein multiple tables are being read from Teradata & MS SQL server DBs.

 

The output of workflow is being written directly to a staging table on MS SQL DB using standard output tool ODBC connection (tried OLeDB, but didn't help much). The workflow is typically processing ~700k rows with ~80 columns; its taking 1hr & 20 mins of runtime which is way too high to deploy it as solution to the client. The output tool takes more than 75% of processing time when checked in profiler.

 

However, when i store the same output in CSV and feed it to workflow where only output tool (writing to MS SQL) staging db is configured, it completes within 4 mins. Also the workflow, when run without output tool, takes around 15 mins to complete all the transformation. So separately the workflow needs 15+4 = ~20 mins but when put together, t takes 1 hr. 20 mins.

 

The ultimate goal is to publish & schedule the workflow on gallery. This is utmost priority to me as all the hard work will vain if workflow takes that much time

 

Any suggestion are most appreciated.

2 REPLIES 2
PhilipMannering
16 - Nebula
16 - Nebula

So it's the transference of data (in the Data Stream In and Data Stream Out Database Tools) that takes the longest. I think you have a couple of options to make it run quicker, but interested in others' solutions.

1) You do everything using the In-db tool. This might be tricky for ~70 tools.

2) You do most of it In-db. Particularly, filter, select and summarize, etc so that the number of columns and rows passed into your local environment (using Data Stream In) is reduced. The more you can reduce the size of your dataset for Data Stream In/Out, the quicker your workflow will run. 

3) You run a workflow that dumps the data (on say a daily basis) as a yxdb file on a shared drive. And a separate workflow uses this yxdb file. This will be considerably quicker but not use up to the minute data from the database.

 

These are the options I know of. Interested to hear what others say.

 

Akash2093
7 - Meteor

Thanks Philip for your response.

 

The first 2 options require considerably more time which i am short of currently since the workflow is already developed.

 

About dumping to Alteryx db and feeding it to output, the staging db was created specifically to avoid this dependency of a file on shared drive. Going back to this approach might raise questions & would be liable to under-utilization of available resource.

 

I am also waiting to see more responses. 🙂

 

Best Regards

Labels