I need an overview of some files located on a drive to which the Alteryx server will not be allowed access (so no, I can't make use of the server to increase processing speed, even if I'd like to do that). The drive is in the TB-range, and runtimes for subfolders will in some cases be more than an hour when running the relevant workflows from the designer. I am aware that while a workflow is running in the designer I can still work on other stuff, related to other workflows, inside the designer, but I'm not clear on exactly how Alteryx allocates the system resources available among the different active workflows.
The basic question is this: If I have, say, two different folders I need to process, will I get any performance benefit from runnning two different Alteryx flows in parallel inside the designer (so that both workflows are running at the same time), compared to if I'd just run the flows sequentially, one after the other? So for example if I have two workflows that each, if run on its own with no other stuff going on inside the designer, take an hour to process in Alteryx, will the total run time be 2 hours regardless of whether I run the flows simultaneously or not? Are there any relevant 'rules of thumb' here, aside from 'try to avoid having Alteryx switch to virtual memory'?
Solved! Go to Solution.
Hi @BYJE
I'll try and give a short answer first, and allow you to follow up with questions if need be 🙂
Alteryx is able to run many workflows at once and you control the memory limit given to each workflow. This allows you to run 2 at once, and given the resources, they would finish in the same time it would take to run just the one (assuming they take the same amount of time).
The bit you need to fact in is, given the resources, AKA if you don't have the RAM to share you end up with the scenario that they may run in the same time, or potentially slower, than they would have if you ran them in series (due to potential memory conflicts and even errors in the worst case).
A rule of thumb to apply to the memory setting for Alteryx is
(Available RAM / 2) / Number of Parallel Workflows
So if you had a machine with 16GB RAM. Divide by 2, to leave stuff for Windows and other things. Then divide by 2 if you want 2 to run and once, leaving each workflow with 4GB RAM.
These settings are available within Options > System Settings > Edit System Settings
Thanks for the response @JoeS - this was quite helpful, I'll accept it as a solution.
I'll of course try to avoid inefficient processing, but in case I find myself in such a setting, perhaps without knowing it ('because I encountered an unusually-sized thumb'): Can you tell me a little more about the sort of errors related to suboptimal processing secondary to insufficient available RAM which might occur - mainly, are these data errors (faulty data, e.g. nulls instead of digits) or are they 'Alteryx errors' which might lead to no output being generated? I ask because in this particular setting I don't care very much about the former, as long as the error rate is reasonably low, but I'd potentially care about the latter. If a few individual files are not handled correctly and this leads to incorrect output that doesn't really matter in this case - this is a highly dynamic file system I'm exploring, so running the same flow in the morning and the afternoon would give far from identical results anyway - but if a run that's been processing for 50 minutes fails to produce any output because Alteryx stops the flow because of a runtime error, that will matter.
It's going to be the latter part. You may encounter memory pipe errors and this will halt the workflow and cause it to fail.
So I'd recommend being sensible with the setting and keeping your wits about you with that thumb of yours!
Having worked with Alteryx for 10+ years I have never encountered it create data errors due to being over loaded, and I was a customer putting 6+billion records through it in one daily process, where the checks would have spotted this.
If you do go with my formula that should definitely give you a good base to start from, and if you find that you aren't getting the performance you need, my recommendation would be to add more RAM to the machine itself after playing a little with the settings.