Hi all,
I'm trying to figure out the cause of an issue with our internal Alteryx server to avoid re-creating this problem.
We had a series of jobs running every two hours successfully for about a week. Last weekend, two of them ran LONG (as in, for days). This is strange because
- There's no reason these should go for anything over 1.5 hours.
- My understanding is that our server has a 5 hour time limit set.
- When trying to cancel these jobs manually from the server, the task status would change to "cancelling" and then back to "running" on a refresh.
- This happened on two different worker nodes, one with a Friday evening job, the other with a Saturday afternoon job.
The "solution" to this was for our sys admin to reboot those two worker nodes. I haven't been able to find logs for the long running jobs.
If anyone has any thoughts about what might cause this and where to look to avoid recreating it, that would be marvellous.