This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Tools within a workflow needs to be able to run in parallel whereever applicable.
For example: Extracting 10 million rows from one source, 12 million rows from a different source to perform blending.
currently the order of execution is the order in which tools are dragged into the canvas. Hence Source1 first, Source2 second and then the JOIN.
Here Source1 & Source2 are completely independent, hence can be run in parallel. Thus saving the workflow execution time.
Execution time is quite crucial when you have tight data loading window.
Hopefully alteryx considers this in the next release!
I second this!
Even if the parallelization was only for input tools and nothing else, it would greatly enhance the efficiency of worflows which source data from multiple databases.
I need this feature very often too! Usually I simple create several workflows and run them in parallel. With a little external code this parallelization can even pretty easily be automated. In my opinion however this is just a so clear-cut future feature of Alteryx that it doesnt make sense not to have it implemented. Adding this would make the ETL part of the tool more complete.
It is however my opinion that the feature should not stand alone, but be part of a larger set of features for controlling the execution order of nodes and tool containers. Alteryx would become truly awesome if this was implemented.
I'd like this as well. Thanks.
Thanks for your reply!
I'm very surprised to see that enterprise clients who have a tight data loading / analytical data processing windows to fetch data from source systems did not request this parallel processing feature!
I think this would be valuable also. In my mind, the request should be related to scaling processing via multi-threading.
I have a related feature request to allow batch macros to be multi-threaded. Since batch macros know all the possible inputs before the first iteration is run, they theoretically could be processing iterations on multiple threads / cores. This would be an extremely powerful feature, and if implemented only in the context of batch macros, could (maybe?) limit the implementation complexity.
I think this would be a great addition. Why would this slow people down? I'm not sure I get the reasoning behind not planning it.
Thanks for the continued feedback. Though we're still not planning to fully implement parallel processing for the entire workflow, we are starting to look at how we can extract data from multiple input sources at the same time.
Parallel input data extraction would be a very good starting point and to my organization perhaps the most important node to support parallel processing as :
Hope to see this feature soon!
Any updates on this? There are many situations ripe for parallel processing, inputs, splitting a file into multiple streams for sorting in different ways to process same data in different contexts, executing multiple summaries from same source, etc.
The modern PC has multiple cores and multi threading to support this. If I write two simple workflows to read two different inputs I can run them concurrently, that is essentially parallel processing. But if I put both in the same workflow it will serialize, doubling my read time.
I suspect your code is sort of load & go where it is not precompiled but is loaded and interpreted at runtime so everyhing is forced into a single instance of the Alteryx process. I wonder If you could take advantage of piping to build a new type of connector or IO tool to read/write from/to another concurrent Alteryx process. then we could write workflows that are essentially macros that perform a unit of work, reading, writing or processing, All which can run concurrent as the do not bu pass data through pipes as quickly as the buffers allow.
I've used pipes to connect separate programs, one processing and writing and the other reading and doing further processing. reduces significantly file management overhead, i/o and thus wall clock.
Rather than trying to rework internal code in what might be a more major way maybe this approach would offer a less disruptive and easier approach to build in.
Piggybacking on the above comments. This is a barrier to entry for some of my enterprise clients in terms of using Alteryx. Please implement parallel inputs, especially with in-DB tools.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.