Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

How to execute different instances of same workflow in parallel

Dkundu
6 - Meteoroid

Hello - I have use case to execute Altrex workflow in parallel. I have workflow parameterized using a one element which is getting data from Postgres and teradata and performing some comparison and creating an output file . This workflow is highly optimized but it takes 1 hr time to process the comparison as avg data volume is 50m . Now we want to execute the workflow in parallel say 5 parallel process by passing different parameter so that we can perform 5 comparison in 1 hour rather sone spending 5 hours . Could you please let me know how to do this in Altrex? 

6 REPLIES 6
messi007
15 - Aurora
15 - Aurora

@Dkundu,

 

Have tried to transforme it on to macro with application tools inside and then you can call the macro as many time as you want and you can change the parameters as well.

You can see the data cleansing tool as it's macro. You can have an idea :) 

 

messi007_0-1653460376196.png

Hope this helps,

Regards

Ladarthure
14 - Magnetar
14 - Magnetar

Hi @Dkundu,

 

I would not agree with @messi007 , when you set up a macro or any tool in a workflow, it might not paralellize, plus it depends a lot on the content of the macro (tools inside...), most of the time the longest part of the process is extracting data between the server (in your case teradata and postgre) to your environement. In your case, there are a few different possibilities which come to my mind if the data extracted is the same:

  1. Extract the data raw and store it as in db, then do your tests so that it will take less time, you could then multiply the different tests as much as you want since the data will already be stored
  2. Use In Database Tools if you have joins between postgre and postgre and you did it locally, it will allow you create a sql query without coding. BUT you won't be able to join teradata and postgre by using those tools, or what you might have to do would be extract from one system to create a temporary table to be used

If you data is different everytime, I don't see really another way since the most time consuming point will be the extraction of the data.

 

Generally I would advise you to run your workflow with the option Enable Performance profiling (Runtime settings), and see where is the most time consuming action in your workflow, when this is found, you may be able to factorise it with ease!

 

 

Dkundu
6 - Meteoroid

Thanks for your reply . For each run my data is different so we are making the db call well ahead and placing it in alterx server ahead in file . So our requirement is just call same workflow multiple time may be 5 for each data set or input. Pls note we are getting the input in spreadsheet. I am thinking what we save the job 5 times using different name and link it to 5 different spreadsheet so that it can run in parallel. As we are keeping in file our goal is use amp on top of it .

 

Dkundu
6 - Meteoroid

Thanks for your reply . For each run my data is different so we are making the db call well ahead and placing it in alterx server ahead in file . So our requirement is just call same workflow multiple time may be 5 for each data set or input. Pls note we are getting the input in spreadsheet. Do you think you macro will work based on my use case ?

I am thinking if we save the job 5 times using different name and link it to 5 different spreadsheet so that it can run in parallel. As we are keeping in file our goal is use amp on top of it . 

Ladarthure
14 - Magnetar
14 - Magnetar

Hi again @Dkundu,

 

here is a scheme of how I would do it, the first model is if you have multiple source files for your tests, meaning multiple workflow to process theses tests. The second model is a bit more straight forward, you only have 1 datasource for all the tests, you output it as yxdb to improve performance and then you run one workflow containing all the tests you might need.

 

Ladarthure_0-1653479506020.png

 

Dkundu
6 - Meteoroid

My problem is little different . Let’s I have 10 run id like R1,R2…R10 … I have one Altrex flow now if Push R1 to my Altrex flow then it takes 2 hours to process . After that I have to push R2 and wait for 2 hours to complete so if I have 10 run id then I have to wait 20 hours . All I want to complete the process in 2 hours by runnning same workflow 10 times with different run id as parallel stream at same time .

Labels