This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Alteryx Designer is an amazing data tool but it’s partner, "the Scheduler" needs some much needed upgrades. The Scheduler interface that pops up from Alteryx Designer does need a complete make over. I’m not going to address this but rather focus on the functionality that if delivered makes the Scheduler much more useful.
Today I’m reading our Mongo db scheduler data using an Alteryx workflow and Tableau to show what’s happening on the Scheduler. This dashboard is what we refer to frequently to see the health of our companies data pipeline. I’ll share both files soon.
Here are the top 5 features for the Scheduler.
Workflow priority ranking. When two or more workflows are scheduled to run at the same time, ‘priority’ sets the order of execution. Priority is set at the time of scheduling the workflow. Values range from 1 to 100. If a priority is not set the default is 50. The ‘priority’ field can be read from Mongo db.
Why: At midnight we set off several workflows. We want to centrally manage which runs first based on a common ‘priority’ field.
Restrict which controller and workers a specific workflow can run on. Default is to run on all servers. At the time of scheduling a workflow can designated to restrict which server to execute on. This creates a field called ‘restrict’ which server it cannot execute on.
Why: Some workflows only run on the main controller due to file system references. Also a worker can be tuned for CPU or Disk I/O and workflows that can benefit from this tuning. Selecting a disk I/O intensive workflow to run on a server tuned for Disk I/O would speed up our workflows.
Set sequence of successfully completed workflows.
FYI: We used the Runner tool for a short time to resolve this issue but learned quickly that the Runner tool is like a bull in a china shop and brought our server down. The runner tool as it is today is not an option for production work.
Why: This would allow you to run several workflows one after another. For example the first would read from a data source, the second would do calculations on the data and the third workflow would publish the data. All workflows are given a ‘workflow-number’ which can be seen in the scheduler list and read from Mongo db.
If a workflow fails you can set the number of attempts to run successfully. Report attempts greater than 1 in a new field called ‘attempts’ that can be read from the Mongo db.
Why: Some workflows fail and if attempted to run again may work. This includes issues with locked files and workflows dependent on processes outside of Alteryx.
If a workflow runs more than X minutes the scheduler kills the workflow and reports a workflow error with a unique code called ‘execution-limit’. This code can be read in the mongo db. The defaut is 90 minutes and can be set to any number of minutes. Each workflow can have it’s own limit.
Why: Some workflows start to hog resources and need to be killed. If a new workflow is added this is a good way to protect the overall scheduled workflows.
Currently it takes four clicks and three displays (including the current scheduler display I'm attempting to post as well) just to temporarily disable a job. It would be so much more convenient if there were a one touch icon on this display that allowed you to do that function.