Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Server Discussions

Find answers, ask questions, and share expertise about Alteryx Server.

Jobs stuck in Initializing status

hody
5 - Atom

I have two copies of the same job stuck in the Initializing status in Alteryx Server.  All scheduled jobs are now getting queued and not executing.  Multiple service restarts and server reboots have not resolved the issue.

 

I found this in the log file repeating many times:

"S:\Alteryx\Service\AlteryxService_Client\Persistence_MongoDB.cpp: 1558. PersistenceContainer_MongoDBImpl_Get_Error: Record identifier is invalid <ID_REMOVED> collection <AS_Schedules>"
"PersistenceHandler_ReadBody_UnknownError: Resetting persistence containers and rethrowing."

 

While the server thinks these two jobs are initializing, there's no running AlteryxEngineCmd task.

 

I can see the two jobs in MongoDB in AS_Queue.

 

I tried to stop the job using Gallery Admin > Jobs > Status, but the jobs remain.  I also tried using Options > View Schedules in Alteryx Designer on the server, but the "Delete Queue Entry" button is disabled for the two jobs in the Initializing status.

 

Running version:

"AlteryxService version 2021.1.2.20534 (c) Alteryx, Inc. - All Rights Reserved."

 

Any suggestions on how to get the server up and running again during this holiday weekend?  I opened a case with Alteryx Support on Friday morning, but haven't heard anything back.

6 REPLIES 6
kgalbert
9 - Comet

Hi Hody,

 

We upgraded to 2021.1 on Friday and have the exact same problem.  Two jobs are stuck in the queue as initializing and server reboots and service restarts (though we're forced to kill the process to get it to restart) have failed to fix the problem.

 

Let us know if you figure this out.

 

Thanks,
Ken

hody
5 - Atom

I'm still trying to get to the root cause.  After a suggestion from support and a bit of my own experimentation, I was able to get rid of the jobs.  However, I don't recommend following the same path until we find the actual cause.

 

In my testing, I seem to be having problems with workflows that use gallery-defined Data Connections to SQL Server, while workflows that contain their own connection strings seem to be okay.  If you do your own testing, I suggest unchecking the workflow validation box when saving to server.  You can then use the Run button in Gallery, which will timeout after 30 seconds rather than getting totally stuck.

 

Again, I don't suggest trying this yourself, but I'll give you the details anyway.  Note that I have a single server configuration acting as the controller and worker.  To remove the stuck jobs, I did this:

- Changed the server config to NOT run unassigned jobs and added a Job tag that isn't currently used by any workflows

- Restarted the Alteryx Service (had to manually kill the task to complete the restart)

- Went to Gallery Admin > Jobs

- I was now able to use the Minus button to delete the jobs

- Reverted the worker job assignment changes

- Restarted the Alteryx Service

 

 

NPT
8 - Asteroid

We are experiencing the exact same problem.

 

I have narrowed down the issue to the Gallery database connections that are mentioned in @hody's post.  To re-iterate, if I change the Input tool connection to not use the Gallery connection, the process works fine.  So there seems to be an issue with the way Alteryx Server initializes a workflow using a Gallery connection.

 

This issue has manifested AFTER upgrading to Server 2021.1.

 

~ Nathan

 

Stuck Jobs 2021-02-15_12-20-11.png

kgalbert
9 - Comet

Has anyone tried to install their previous version of Alteryx to see if that would resolve the problem?

NPT
8 - Asteroid

Yeah, I was thinking the same thing. I'm going to do that now. I'll post the results.

NPT
8 - Asteroid

Rolling back to 2020.4 seems to have worked. 

 

What a bummer 😔

 

~ Nathan