This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Tools within a workflow needs to be able to run in parallel whereever applicable.
For example: Extracting 10 million rows from one source, 12 million rows from a different source to perform blending.
currently the order of execution is the order in which tools are dragged into the canvas. Hence Source1 first, Source2 second and then the JOIN.
Here Source1 & Source2 are completely independent, hence can be run in parallel. Thus saving the workflow execution time.
Execution time is quite crucial when you have tight data loading window.
Hopefully alteryx considers this in the next release!
Well I got 4 cores, only one utilized...
Looking forward to see parallelization...
This is a high priority deal for us as well. As part of our Alteryx evaluation we're loading several different interrelated files with > 10M rows, all sequentially, 7 out of 8 cores idle. It's taking a few hours, which is not workable for our use case. An update on when multi-threading support will be introduced would be great (noting that this request is >2 years old). Even if it's +/- a year, that would help us plan.
FYI, I have moved all my workflows to use in-database tools and this is no longer an issue in the scenario. Try that out if it is an option for you.
@stephsEC2 engine is in beta now so we close to deadline...
@travisin-db is fast but still it's not parallel I suppose...
thanks Atabarezz, if i understand you correctly, this would be a workaround by engaging EC2 instances to run multiple copies of the workflow? Our particular constraint there is that we have to run everything on-prem. If a single desktop install can't take advantage of multiple cores we're kind of dead in the water.
Alteryx never posted in this discussion exactly what they are doing in response. Is there a document describing this solution or a location in the Community to see the evolving requirements they are building to? I'm being lazy here vs searching out an answer but it would be good to have a clear understanding of the solution when proposed before a lot of development is done to insure the solution is on target. This discussion has touched on 3 specific ideas that are likely 3 different solutions.
I was initially looking for the ability within a single workflow to have more than one path processing in parallel, regardless of being associated with an input or not. For example there could be one input tool that has the downstream split into 2 or more paths that process the data in different specific ways and later join the results or not. the issue being those paths will run serially and we often have the resources to support running both paths simultaneously.
The idea extends when one considers a scenario where a different input tool is at the beginning of each path allowing 2 or more sources to be read simultaneously and process downstream simultaneously until a point where they join.
Now an extension of the first is to be able to have the parallel processing of tool paths extend to macros one may create and call from a parent flow. one might assume this would by default work...if I can have 2 or more tool paths run a macro is a dynamic inclusion of a tool path so why wouldn't macros run in parallel if paths within a flow do....I just raise the question but I learned a long time ago not to assume.
In my mind a different issue but valuable one also raised here is utilization of CPU, I have not had much issue with this unless doing modelling which is very cpu intensive. But the issue here is using only one of multiple cores. an issue in the current form of Alteryx where paths are still serial and potentially more impacting if Alteryx provides the ability to run more than one tool path simultaneously.
So at this point I am seeing 2 very different issues that would have very different solutions.
I also raised a third desire which is batch pipes. I have used batch piping as a solution on mainframes resulting in huge savings of $ and time. I think you can do it in Unix but I do not know if there is an existing solution in the Microsoft OS world, but that doesn't mean you can't build it...
So which of these ideas are addressed by EC2 engine and in what way does it solve the problem?
Can we receive an update on these? I am most interested in being able to connect to multiple data sources as inputs, at the same time. Some of my workflows require 20+ inputs from different data sources. The code for each of these ~3 minutes on avg. Thus, an hour just to fetch data. I would love to be able to pull the data in parallel.
In case it helps my workaround to this problem was to use the command tool to run a Python script that uses odbc to download all the data locally in parallel then load the local data into alteryx after the script runs.
Your work around sounds like avoidance. you are simply reading and combining data outside Alteryx and our goal is to do this within Alteryx. I currently do something similar where I break up a flow into sections of code (tool paths) that would run for an appreciable time independently and run them in parallel then the portion of the flow that joins/appends these sources are run as a final step, a final flow after the others complete. you can do this manually or in my case I wrote a scheduler that does this automatically.
Ideally I would not need to break up a flow in this way or similarly you would not need to perform the extraction and merge/join of initial data outside of Alteryx.
But to jsuptic's point what is the status? is there an update and can we get clarity on exactly how this is being approached...or is it on a shelf or even discarded at this point?
idea opened in 2015 waiting since then...2019 today
no wonder Gartner slapped alteryx in the latest (28.01.2019) dated magic quadrant as "not being innovative enough", "lack of vision"...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.