Hello,
Recently, our team encountered and resolved some issues related to a Microsoft Server patch KB4338815 on our Alteryx Server (hosted on Windows Server 2012 R2). We were experiencing outages that were seemingly caused by nothing (no workflows running, low usage times, etc.) and I wanted to share our solution in case it may be affecting other users.
We experienced several outages with the following behavior:
- Service logs would detect a ping fail for our worker processes:
WARN,7040,AlteryxService,GalleryManager,,,,,,,"GalleryDaemon_Ping_Failed: Response code <400>"
WARN,7040,AlteryxService,GalleryManager,,,,,,,"GalleryManager_DoWork_PingFailed: Ping failed, fail count now <1>."
- Ping failures would increase, one by one, until the gallery shuts down (about an hour or so after the first failure)
- After the gallery shuts down, we were unable to stop or restart the AlteryxService (sat in a state of "Stopping")
- The only way to fix the issue was to completely reboot the server
- Alteryx works normally after the reboot, until a ping fails again and the cycle repeats on a seemingly random basis (~every 1-2 weeks)
- The application logs and server events show no apparent cause for why the pings continue to fail
We determined with our internal IT that this was caused by Microsoft Server Patch KB4338815, which was applied to our server a week before the first outage. This patch was associated with several other application issues across the company exhibiting the exact same behavior. Microsoft released KB4345424, which addresses these application issues and has successfully resolved the application issues with several applications across our company, including Alteryx. Once KB4345424 is applied, the issue goes away.
Hope this helps anyone else who may be experiencing the same!
Note: we are running Alteryx Server 2018.1.4.44311