This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Tools within a workflow needs to be able to run in parallel whereever applicable.
For example: Extracting 10 million rows from one source, 12 million rows from a different source to perform blending.
currently the order of execution is the order in which tools are dragged into the canvas. Hence Source1 first, Source2 second and then the JOIN.
Here Source1 & Source2 are completely independent, hence can be run in parallel. Thus saving the workflow execution time.
Execution time is quite crucial when you have tight data loading window.
Hopefully alteryx considers this in the next release!
I completely understand that some ideas can take a long time to implement. All I and most others on this thread are after is an update on the progress of the feature review, even if the update is that no progress has been made.
From our PM organization regarding this idea.
We are beginning to introduce parallelism of Input reads with e2, starting with support for Excel (all formats), csv, yxdb, mdb and SQLite formats. Note that for csv and yxdb we can read a single file in parallel (chunking) whereas the other formats are read in parallel only if multiple input files of the same type are present (e.g., four Excel Input Data tools). This is a complex issue requiring smart data synchronization and we will continue to work to support other formats/databases as we move forward with the e2 effort. If you would like to test this functionality please join the e2 Beta Program as we would appreciate your input
well great explanation... thanks for the information...
Thanks for relaying the response @DanM. I'm looking forward to seeing the feature in Alteryx. I'll follow-up again for an update in a few months.
I was deflated to discover Alteryx single threads the whole workflow. Coming from an SSIS background where they have separate control flow and data flow layers I was presuming Alteryx had SSIS beat in that way like they had every other. Live and learn. Is there any community developed methods ?
@DanM I think something has been lost in all the conversations...I had originally opened an idea and had starred and commented on others similar ideas with the main purpose of having multiple paths running in parallel. not simply multithreading an input read...Dan's comment "we can read a single file in parallel (chunking)" leads me to believe they are working on beefing up the speed of a read of input but I am after the getting the work the follows the input tool to run in parallel with other data streams. My example was I had a flow that had 2 inputs from SQL which were then blended and massaged by a number of tools before joining. each of the streams would process for about 2 hours before joining and the "read" from SQL was only half of that. Once joined the remaining processes ran another hour or 2.
I broke that flow into 3 separate flows, and ran the first 2 at the same time reducing the wall clock time by 2 hours. I want parallelism within the flow so I don't have to break them into many smaller parts. but maybe that is not doable.
Another wish item was piping 2 or more flows together. in my youth I worked batch systems on IBM mainframes and we used "batchpipes" which was a software feature that allowed us to link different programs together and run at the same time, one writes and the other reads as the first writes. It dramatically reduced IO, processing time and cost because we did not have to write a file for the next program(s) to read they simply read from the pipe while the data was being written. this is a simplistic explanation but if you could replicate batchpipes on your system everyone processing large data volumes will go nuts.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.