During our "busy season", the web frontend of the Alteryx Gallery sometimes crashes and returns HTTP 404 until the Alteryx Service is manually restarted.
The Alteryx Engine is still running in the background, but users are complaining because they cannot track the progress of their workflow in the Web UI.
Restarting the Alteryx Service requires to forcably close all Alteryx Engine processes.
I found someone with the same problem in the community, but there was no solution yet.
https://community.alteryx.com/t5/Alteryx-Server-Discussions/Gallery-goes-inaccessible-frequently-404-File-or-directory-not/m-p/465984/highlight/false#M4727
An investigation with one of Alteryx Support Specialists revealed that this crash is caused by a lack of CPU ressources on the server.
(The CPU is exhausted by a third-party app, which is used by our Alteryx workflows, but this can also happen due to R or Python scripts)
From a server application on an enterprise level, I expect robustness even when dealing with low hardware capacity.
I am Ok, when the Gallery is not available or slows down, when the CPU completely occupied.
But I expect an enterprise application to recover once resources are available again without losing any information or progress.
I wish that the Alteryx Gallery does not crash anymore, when the CPU is running at 100% for longer time, or at least automatically reboots after a crash.