We would like a built-in process that would search or, and resolve, workflows that are stuck in the "initializing" state. These seem to happen for various reasons but communication problems between the controller and workers .. usually a socket timeout.. which appears to be most problematic. It seems that these type of errors should be expected in all but the most stable environments,
Currently, the only tool that we have to solve this problem is to restart the Alteryx Service on the controller and while this works there tendency to cause some collateral damage in workflows ...erroring or restarting from their beginning.
There may be a way to solve this without restarting the service by editing Mongo using a tool like Robo 3T but that is unproven and has its own risk.
After dealing with this issue and struggling for quiet some time we think that the best option is to implement a "clean up" DB process that will run every 5 min or so, capture a list of workflows in the "initializing" state , then compare that list to one in the next 5 min cycle and fix any workflows that appear in both lists. We think that returning any stuck workflows to the queued state would be the best Fix option.
We just don't want to continue to use Restart the Service process to solve this issue and accept the collateral damage.
Thank you for your consideration