Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!
The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Ability to execute tools in parallel within the same workflow

Tools within a workflow needs to be able to run in parallel whereever applicable.

 

For example: Extracting 10 million rows from one source, 12 million rows from a different source to perform blending.

currently the order of execution is the order in which tools are dragged into the canvas. Hence Source1 first, Source2 second and then the JOIN.

 

Here Source1 & Source2 are completely independent, hence can be run in parallel. Thus saving the workflow execution time.

 

Execution time is quite crucial when you have tight data loading window.

 

Hopefully alteryx considers this in the next release!

73 Comments
Atabarezz
13 - Pulsar
I second the idea of parallel data inputs and better, I'd love to see multi-threading to be added... Instead of reading row by row starting from the header, can multiple instances read different parts of a file and import lightning fast?
Atabarezz
13 - Pulsar

It seems we can run multiple alteryx instances with different workflows with the help of the command tool according to the help file, https://help.alteryx.com/current/index.htm#Command_Line.htm  So A macro that can autogenerate the same macro with different conditions and call it can do hyperthreading in theory any one wants to join testing :)

fharper
12 - Quasar

When I the "Command Tool" referred to by Ataberezz I first assumed the "Run Command" tool. This allows you to run another program extrnal to the workflow.  This is single threaded as to input and or output between the launched and calling workflow.  This does not provide a means to multi-thread/parallel process as the calling workflow will hold until it receives the response from the called program. 

 

The command line functionality, which is not a tool but is what was referenced, we currently use to run multiple workflows in parallel. In some cases these are worklows that are parts of what would be a larger worklow broken out to speed up processing time.  That effectively allows parallel processing to an extent but forces you to exit the Alteryx environment and work within windows batch or powershell to accomplish the task.  While this is not necessarily a problem if you are comfortable within Windows Batch and or Powershell but even if you are you now have to build in logic to monitor completion and success and subsequently trigger the next dependent workflow, all outside of Alteryx.  Alteryx is a great tool but part of the appeal is its ease of use.  It would be much more valuable to a larger audience if we could have the ability to parallel process within Alteryx and not need to go outside of it for these things.

 

To illustrate I built a workflow to read one input and do some transformations, then read another input and do some different transformations, then join the two and do final processing and reporting.  Once I had it working with a test set of small data taking seconds to run I ran against real data with 10 million rows in each input with transformations on one side expanding rows by a factor of 24 on average it now takes hours so I broke it into 3 workflows one each for the reads and initial transofrmations specific to those inputs and one to join and do final processing, running the reads in parallel from batch followed by a launch of the final when the other 2 are both completed successfully cuts the run time by 60%. 

 

But I have to do it from batch, I can't have both reads and downstream transforms active at the same time within a single workflow. I can't make the 2 initial workflows into macros and have them process in parallel from the parent workflow and the same constraint is true using the "Run Command" tool within a parent workflow.  The parent will only launch one macro or "Run Command" tool at a time. 

 

How sweet would it be if I could launch multiple macros or "Run Command" tools that can run in parallel while the parent waits for each to fill its downstream buffer at the next tool like a join or union.  There are several ways to make this work, I just hope Alteryx pursues this and provides a nice clean solution.

 

A macro and a Run Command tool that runs a workflow are in effect the same but I tend to like the ease of using a macro for workflows so I wish we had maybe a new type of macro that a parent workflow would recognize as a type to fire without waiting and move on to any other macros like that and fire them and wait for the first macro output tool that starts filling buffers and then que off that to start working normal from that point and when it reaches a point where it needs input from one of the other parallel macros it waits until that stream is flowing, like a join or union would.  it leverages a known method and only requires a little queing and monitoring logic under the covers.  But I am only theorizing

fharper
12 - Quasar

Forgive the many typos

 

Atabarezz
13 - Pulsar

well done,  we may have some improvements over that to automate further...

would you mind sharing the workflow? Ideation part unfortunately don't let to share files ı guess

so It may be better to openup a discussion and maybe more people will participate for an improved solution...

 

Cheers

fharper
12 - Quasar

I sent you my contact info so we can have a conversation and decide where to go from here

Atabarezz
13 - Pulsar
  • Created a 1 mio row excel file with a segment column including A's and B's as categorical values,
  • Prepared two simple workflows, one filters A, the other filters B, outputs to different excel files
  • Created a *.bat file referring to the AlteryxService and providing the names for the yxmd files

----

AlteryxService.exe addtoqueue=BatchTest.yxmd,localhost,secret
AlteryxService.exe addtoqueue=BatchTest2.yxmd,localhost,secret

----

When you run the the file, two Alteryx instances are created simultaneously and two files are extracted, straightforward to join afterwards...

I'll test this on SQL Server as well with pushthrough SQL and make a comparison in between a single data extract and threaded extracts

 

Make sure you change the settings for multiple simultaneous runs;

Picture1.png

 

Atabarezz
13 - Pulsar

For those who don't have the scheduler, *.bat file including 3 workflows this time,

seperated with pipe leads to sequencial data extracts

---

"C:\Program Files\Alteryx\bin\AlteryxEngineCmd.exe" BatchTestA.yxmd | "C:\Program Files\Alteryx\bin\AlteryxEngineCmd.exe" BatchTestB.yxmd | "C:\Program Files\Alteryx\bin\AlteryxEngineCmd.exe" BatchTestC.yxmd

fharper
12 - Quasar

Just to follow up on Ataberezz's posts above.  Nice illustration, you have created an example of using the command line to spawn multiple instances of Alteryx to run in parallel.  As I mentioned in a prior post that is a method we and others use to get around the lack of in workflow parallelization. 

An important point I forgot to mention in my prior post, and is relevant to Ataberezz's, is that you must either have the Server license or a Designer with scheduler option license to be able to run from the command line.  Unless the most recent release has changed this you must have Scheduler capability to run the Alteryxengincmd.exe from command line or therefore batch.  So this is not an option for users with Designer only licenses, another reason to get a true Alteryx built in capability.

 

We built our own scheduler around the command line capability that comes with the Scheduler option rather than using the Alteryx scheduler because the Alteryx scheduler is analagous to the windows Task Scheduler and so is a bit limited.  We needed the ability to have more complex trigger scenarios including predeccessor constraints. For this reason we have our machines with large amounts of RAM and the setting "workflows allowed to run simultaneously" set to 10 or more as we often do run that many concurrent processes at peak times on the main PC we run our scheduled processes from.  If anyone has questions on batch processing with Alteryx or building a more full featured scheduler feel free to connect at the Tampa User's group or contact directly.

 

Thanks Ataberezz for posting an example.

chadanaber
7 - Meteor

Has there been any movement on this idea within Alteryx?  This would be excellent functionality to include.