I have an Alteryx server with several workers on it. There have been times when all the workers get occupied with long-running jobs, and the queue builds considerably. Sometimes these jobs are running long because there is a problem with them that is not throwing an error or causing a failure; essentially they are hung. I would like to be made aware of this situation when it happens instead of relying on users to tell me that their jobs haven't run for hours.
I can do this kind of monitoring by running workflows on the server, but in my situation, a monitoring workflow would not run because all the workers are occupied. Therefore, I am looking for a solution that does not require running anything on the servers. Has anyone implemented anything like this? If so, can you describe your solution? The best I've come up with is using something like Azure Data Factory to run an API call against the Alteryx server to get the list of jobs, then counting the number that have a Queued status or looking at how long all the currently-running jobs have been running for. Thanks in advance for any advice you can provide.