Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Server Knowledge Base

Definitive answers from Server experts.

Alteryx Server Backup and Recovery Part 2: Procedures

KevinP
Alteryx Alumni (Retired)
Created

This is the second article in a series on Alteryx Server backup and recovery. You can find Part 1 at:

 

Alteryx Server Backup and Recovery Part 1: Best Practices

 

As long as a backup of the Mongo database is available, you can get Alteryx Server back up and running. Luckily, backing up the embedded MongoDB is pretty simple, and can be done with a few console commands. I would recommend creating a batch file or script to perform the process. Doing so will allow you to schedule the backup using Windows Task Scheduler. The actual steps to perform a MongoDB backup are covered in detail in the online help under the server configuration section or at this direct link. I will also outline the steps below for completeness.

 

To create a backup of the MongoDB:

 

  1. Stop AlteryxService.
  2. Execute the following command to save a backup of the database in the specified folder:

 

alteryxservice emongodump=
  1. Restart AlteryxService

 

You can easily script this to a batch file with a few simple console commands. Keep in mind that paths may vary on your server, but it should look something like this.

 

Example:

 

 

"C:\Program Files\Alteryx\bin\AlteryxService.exe" stop
"C:\Program Files\Alteryx\bin\AlteryxService.exe" emongodump=Z:\Path\MongoBackup
"C:\Program Files\Alteryx\bin\AlteryxService.exe" start

 

 

You can add additional features, such as logging and date/time stamps, to the backups. As an example of additional useful features to include with your backups, I have includedthe code for a batch scriptI created that adds the following information: logging with date/time stamping, a backup that is also date/time stamped, automated archival of the backup, copying the archive to a network location, and cleanup of the temp files.

 

Once you have a batch file or other script to perform your backups, you need to test the script to ensure it works properly. Once testing is done, the next step is to schedule the backup. The easiest way to do this is to use Windows Task Scheduler. To create a scheduled task on Windows 2012 Server, follow these steps:

 

Create a scheduled task:

 

  1. Open Task Scheduler and click on “Create Task”

2016-05-11_8-50-15.png

 

  1. On the General tab, enter “Name”, “Description”, select “Run whether user is logged in or not", and select "Run with highest privileges"

2018-07-27_8-54-52.png

 

  1. On the Triggers tab, click “New”

2016-05-11_9-01-50.png

 

  1. A dialogue box will appear. Define the schedule (daily, weekly, etc...) on which you want the backup to run and click “OK”

2016-05-11_9-05-10.png

 

  1. On the Actions tab click “New”

2016-05-11_8-55-03.png

 

  1. On the dialogue window, make sure “Start a Program” is selected and click “Browse”. Select the batch file you created and click “Open”. Then click “OK”.

2016-05-11_8-57-49.png

 

  1. Click “OK” on the Create Task window to finalize the creation of the backup task.

 

Now that you have successfully implemented backup procedures and scheduled a task to automate the backups, it is time to discuss database restoration from a backup. The good news is that restoring the database is just as simple as backing it up. Assuming that 1) the server is functioning, 2) Alteryx Server is installed, and 3) you have a valid backup available, you can follow these simple steps outlined below.

 

To restore a backup of the MongoDB:

 

  1. Stop AlteryxService
  2. Execute the following command to restore the backup:

 

alteryxservice emongorestore=,

 

  1. Restart AlteryxService

 

This simplicity and same focus on command line statements means that we can also script recovery. However, since recovery actions are much less frequent, it probably isn't necessary. Instead, you would just connect to the server, open a command prompt and, following our backup example above, execute the following commands:

 

Example:

 

 

"C:\Program Files\Alteryx\bin\AlteryxService.exe" stop
"C:\Program Files\Alteryx\bin\AlteryxService.exe" emongorestore=Z:\Path\MongoBackup,C:\ProgramData\Alteryx\Service\Persistence\MongoDB
"C:\Program Files\Alteryx\bin\AlteryxService.exe" start

 

 

For Alteryx Server we also recommend backing up the controller token and some settings files. While the server can be recovered without these files. Having a backup of them can expedite the recovery process, and they will also ensure you will be able to decrypt any sensitive data in the database. The settings files we recommend backing up are:

 

C:\ProgramData\Alteryx\RuntimeSettings.xml

C:\ProgramData\Alteryx\Engine\SystemAlias.xml

C:\ProgramData\Alteryx\Engine\SystemConnections.xml

 

Again, please keep in mind the exact paths may vary depending on the server configuration and where the backup is located. This example also assumes the backup isn't compressed/archived. If you are using a backup script that archives the backup and copies it to network storage, you will need to copy the backup file to the server and decompress the archive before running the recovery commands above.

 

 

Below is the code for my sample batch script:

 

::-----------------------------------------------------------------------------
::
:: AlteryxServer Backup Script v.2.0.2 - 01/04/19
:: Created By: Kevin Powney
::
:: Service start and stop checks adapted from example code by Eric Falsken
::
::-----------------------------------------------------------------------------

@echo off

::-----------------------------------------------------------------------------
:: Set variables for Log, Temp, Network, and Application Paths
::
:: Please update these values as appropriate for your environment. Note
:: that spaces should be avoided in the LogDir, TempDir, and NetworkDir paths.
:: The trailing slash is also required for these paths.
::-----------------------------------------------------------------------------

SET LogDir=C:\ProgramData\Alteryx\BackupLog\
SET TempDir=C:\Temp\
SET NetworkDir=\\ServerName\SharePath\
SET AlteryxService="C:\Program Files\Alteryx\bin\AlteryxService.exe"
SET ZipUtil="C:\Program Files\7-Zip\7z.exe"

:: Set the maximium time to wait for the service to start or stop in whole seconds. Default value is 2 hours.
SET MaxServiceWait=7200

::-----------------------------------------------------------------------------
:: Set Date/Time to a usable format and create log
::-----------------------------------------------------------------------------

FOR /f %%a IN ('WMIC OS GET LocalDateTime ^| FIND "."') DO SET DTS=%%a
SET DateTime=%DTS:~0,4%%DTS:~4,2%%DTS:~6,2%_%DTS:~8,2%%DTS:~10,2%%DTS:~12,2%
SET /a tztemp=%DTS:~21%/60
SET tzone=UTC%tztemp%

echo %date% %time% %tzone%: Starting backup process... > %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

::-----------------------------------------------------------------------------
:: Stop Alteryx Service
::-----------------------------------------------------------------------------

echo %date% %time% %tzone%: Stopping Alteryx Service... >> %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

SET COUNT=0

:StopInitState
SC query AlteryxService | FIND "STATE" | FIND "RUNNING" >> %LogDir%BackupLog%datetime%.log
IF errorlevel 0 IF NOT errorlevel 1 GOTO StopService
SC query AlteryxService | FIND "STATE" | FIND "STOPPED" >> %LogDir%BackupLog%datetime%.log
IF errorlevel 0 IF NOT errorlevel 1 GOTO StopedService
SC query AlteryxService | FIND "STATE" | FIND "PAUSED" >> %LogDir%BackupLog%datetime%.log
IF errorlevel 0 IF NOT errorlevel 1 GOTO SystemError
echo %date% %time% %tzone%: Service State is changing, waiting for service to resolve its state before making changes >> %LogDir%BackupLog%datetime%.log
SC query AlteryxService | Find "STATE"
timeout /t 1 /nobreak >NUL
SET /A COUNT=%COUNT%+1
IF "%COUNT%" == "%MaxServiceWait%" GOTO SystemError 
GOTO StopInitState

:StopService
SET COUNT=0
SC stop AlteryxService >> %LogDir%BackupLog%datetime%.log
GOTO StoppingService

:StopServiceDelay
echo %date% %time% %tzone%: Waiting for AlteryService to stop >> %LogDir%BackupLog%datetime%.log
timeout /t 1 /nobreak >NUL
SET /A COUNT=%COUNT%+1
IF "%COUNT%" == "%MaxServiceWait%" GOTO SystemError 

:StoppingService
SC query AlteryxService | FIND "STATE" | FIND "STOPPED" >> %LogDir%BackupLog%datetime%.log
IF errorlevel 1 GOTO StopServiceDelay

:StopedService
echo %date% %time% %tzone%: AlteryService is stopped >> %LogDir%BackupLog%datetime%.log

::-----------------------------------------------------------------------------
:: Backup MongoDB to local temp directory. 
::-----------------------------------------------------------------------------

echo. >> %LogDir%BackupLog%datetime%.log
echo %date% %time% %tzone%: Starting MongoDB Backup... >> %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

%AlteryxService% emongodump=%TempDir%ServerBackup_%datetime%\Mongo >> %LogDir%BackupLog%datetime%.log

::-----------------------------------------------------------------------------
:: Backup Config files to local temp directory. 
::-----------------------------------------------------------------------------

echo. >> %LogDir%BackupLog%datetime%.log
echo %date% %time% %tzone%: Backing up settings, connections, and aliases... >> %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

copy %ProgramData%\Alteryx\RuntimeSettings.xml %TempDir%ServerBackup_%datetime%\RuntimeSettings.xml >> %LogDir%BackupLog%datetime%.log
copy %ProgramData%\Alteryx\Engine\SystemAlias.xml %TempDir%ServerBackup_%datetime%\SystemAlias.xml
copy %ProgramData%\Alteryx\Engine\SystemConnections.xml %TempDir%ServerBackup_%datetime%\SystemConnections.xml
%AlteryxService% getserversecret > %TempDir%ServerBackup_%datetime%\ControllerToken.txt

::-----------------------------------------------------------------------------
:: Restart Alteryx Service
::-----------------------------------------------------------------------------

echo. >> %LogDir%BackupLog%datetime%.log
echo %date% %time% %tzone%: Restarting Alteryx Service... >> %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

SET COUNT=0

:StartInitState
SC query AlteryxService | FIND "STATE" | FIND "STOPPED" >> %LogDir%BackupLog%datetime%.log
IF errorlevel 0 IF NOT errorlevel 1 GOTO StartService
SC query AlteryxService | FIND "STATE" | FIND "RUNNING" >> %LogDir%BackupLog%datetime%.log
IF errorlevel 0 IF NOT errorlevel 1 GOTO StartedService
SC query AlteryxService | FIND "STATE" | FIND "PAUSED" >> %LogDir%BackupLog%datetime%.log
IF errorlevel 0 IF NOT errorlevel 1 GOTO SystemError
echo %date% %time% %tzone%: Service State is changing, waiting for service to resolve its state before making changes >> %LogDir%BackupLog%datetime%.log
SC query AlteryxService | Find "STATE"
timeout /t 1 /nobreak >NUL
SET /A COUNT=%COUNT%+1
IF "%COUNT%" == "%MaxServiceWait%" GOTO SystemError 
GOTO StartInitState

:StartService
SET COUNT=0
SC start AlteryxService >> %LogDir%BackupLog%datetime%.log
GOTO StartingService

:StartServiceDelay
echo %date% %time% %tzone%: Waiting for AlteryxService to start >> %LogDir%BackupLog%datetime%.log
timeout /t 1 /nobreak >NUL
SET /A COUNT=%COUNT%+1
IF "%COUNT%" == "%MaxServiceWait%" GOTO SystemError 

:StartingService
SC query AlteryxService | FIND "STATE" | FIND "RUNNING" >> %LogDir%BackupLog%datetime%.log
IF errorlevel 1 GOTO StartServiceDelay

:StartedService
echo %date% %time% %tzone%: AlteryxService is started >> %LogDir%BackupLog%datetime%.log

::-----------------------------------------------------------------------------
:: This section compresses the backup to a single zip archive
::
:: Please note the command below requires 7-Zip to be installed on the server.
:: You can download 7-Zip from http://www.7-zip.org/ or change the command to
:: use the zip utility of your choice as defined in the variable above.
::-----------------------------------------------------------------------------

echo. >> %LogDir%BackupLog%datetime%.log
echo %date% %time% %tzone%: Archiving backup... >> %LogDir%BackupLog%datetime%.log

%ZipUtil% a %TempDir%ServerBackup_%datetime%.7z %TempDir%ServerBackup_%datetime% >> %LogDir%BackupLog%datetime%.log

::-----------------------------------------------------------------------------
:: Move zip archive to network storage location and cleanup local files
::-----------------------------------------------------------------------------

echo. >> %LogDir%BackupLog%datetime%.log
echo %date% %time% %tzone%: Moving archive to network storage >> %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

copy %TempDir%ServerBackup_%datetime%.7z %NetworkDir%ServerBackup_%datetime%.7z >> %LogDir%BackupLog%datetime%.log

del %TempDir%ServerBackup_%datetime%.7z >> %LogDir%BackupLog%datetime%.log
rmdir /S /Q %TempDir%ServerBackup_%datetime% >> %LogDir%BackupLog%datetime%.log

::-----------------------------------------------------------------------------
:: Done
::-----------------------------------------------------------------------------

echo. >> %LogDir%BackupLog%datetime%.log
echo %date% %time% %tzone%: Backup process completed >> %LogDir%BackupLog%datetime%.log
GOTO :EOF

:SystemError
echo. >> %LogDir%BackupLog%datetime%.log
echo %date% %time% %tzone%: Error starting or stopping service. Service is not accessible, is offline, or did not respond to the start or stop request within the designated time frame. >> %LogDir%BackupLog%datetime%.log
 

Multi-Node Environments

 

Multi node environments require all nodes to be brought down before performing the backup. This may not seem necessary, but we have seen instances in Support where not following this procedure has resulted in corruption in MongoDb. To bring all nodes down and back up gracefully, please follow these steps.
  1. Stop AlteryxService on Gallery machine (if Gallery is on the same node as the Controller, skip this step)
  2. Stop AlteryxService on all Worker nodes not on the Controller node
    1. Wait for service to fully stop. Workflows running can cause this to take some time.
  3. Stop Alteryx Service on the Controller node
  4. Perform backup as documented above
  5. Start AlteryxService on the Controller node
  6. Start AlteryxService on all Worker nodes
  7. Start AlteryxService on the Gallery node

This process will require coordination between multiple servers. We leave the task of this coordination up to the user as there are too many options to list and they require individual configuration for the user's environment.

 

 
Comments
Coxta45
11 - Bolide

@KevinP,

 

First and foremost, thanks so much for the comprehensive write-up!

 

I've able to run an edited version of your batch script and schedule it successfully.  It's a big relief to have some backups running and I'd encourage all server admins to implement this process - we've learned the hard way that it's imperative to have the ability to restore promptly.

 

Two Questions:  

 

  1. Any idea why I can't find the SystemAlias.xml and/or SystemConnections.xml files on our server?  I was able to locate and copy the RuntimeSettings.xml file (location shown in batch script below)
  2. I noticed that the log reads - during the mongodump - that  DBName=AlteryxService;  Does the mongodump also make a copy of the AlteryxGallery database and it's collections?

I thought I'd share the batch script that worked for us in case others may find it helpful.  I had to make some very slight adjustments, mainly just adding quotes around my copy statements due to spaces in the file paths causing syntax errors.  I also created a %NewDir% variable for the 7-zip copy destination.

 

Batch Script:

 

::-----------------------------------------------------------------------------
::
:: AlteryxServer Backup Script v1.0 - 5/25/2016
:: Created By: Kevin Powney
:: Edited: 2/15/2017 by Taylor Cox
::
::-----------------------------------------------------------------------------

@echo off

::-----------------------------------------------------------------------------
:: Set variables for Log and Temp directories
::-----------------------------------------------------------------------------

SET LogDir=E:\ProgramData\Alteryx\BackupLogs\
SET TempDir=C:\AlteryxBackupResources\Temp\
SET NewDir=I:\MarketingAnalytics\Alteryx\ServerBackups\

::-----------------------------------------------------------------------------
:: Set Date/Time to a usable format and create log
::-----------------------------------------------------------------------------

FOR /f %%a IN ('WMIC OS GET LocalDateTime ^| FIND "."') DO SET DTS=%%a
SET DateTime=%DTS:~0,4%%DTS:~4,2%%DTS:~6,2%_%DTS:~8,2%%DTS:~10,2%%DTS:~12,2%


echo %date% %time%: Starting backup process > %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

::-----------------------------------------------------------------------------
:: Stop Alteryx Service
::-----------------------------------------------------------------------------

echo %date% %time%: Stopping Alteryx Service >> %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

NET STOP AlteryxService >> %LogDir%BackupLog%datetime%.log

::-----------------------------------------------------------------------------
:: Backup MongoDB to local temp directory.
::-----------------------------------------------------------------------------

echo %date% %time%: Starting MongoDB Backup >> %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

"E:\Program Files\Alteryx\bin\AlteryxService.exe" emongodump=%TempDir%ServerBackup_%datetime%\Mongo >> %LogDir%BackupLog%datetime%.log

:: pause

::-----------------------------------------------------------------------------
:: Backup MongoDB to local temp directory.
::-----------------------------------------------------------------------------

echo. >> %LogDir%BackupLog%datetime%.log
echo %date% %time%: Backing up settings, connections, and aliases >> %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

copy "E:\Program Files\Alteryx\bin\RuntimeData\RuntimeSettings.xml" "%TempDir%ServerBackup_%datetime%\RuntimeSettings.xml" >> %LogDir%BackupLog%datetime%.log

:: pause

::-----------------------------------------------------------------------------
:: Restart Alteryx Service
::-----------------------------------------------------------------------------

echo %date% %time%: Restarting Alteryx Service >> %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

NET START AlteryxService >> %LogDir%BackupLog%datetime%.log

:: pause

::-----------------------------------------------------------------------------
:: This section compresses the backup to a single zip archive
::
:: Please note the command below requires 7-Zip to be installed on the server.
:: You can download 7-Zip from http://www.7-zip.org/ or change the command to
:: use the zip utility of your choice.
::-----------------------------------------------------------------------------

echo %date% %time%: Archiving backup >> %LogDir%BackupLog%datetime%.log

"c:\Program Files\7-Zip\7z.exe" a %TempDir%ServerBackup_%datetime%.7z %TempDir%ServerBackup_%datetime% >> %LogDir%BackupLog%datetime%.log

:: pause

::-----------------------------------------------------------------------------
:: Move zip archive to network storage location and cleanup local files
::-----------------------------------------------------------------------------

echo. >> %LogDir%BackupLog%datetime%.log
echo %date% %time%: Moving archive to network storage >> %LogDir%BackupLog%datetime%.log
echo. >> %LogDir%BackupLog%datetime%.log

:: Be sure to update the UNC path for the network location to copy the file to.
copy "%TempDir%ServerBackup_%datetime%.7z" "%NewDir%" >> %LogDir%BackupLog%datetime%.log

:: pause

del %TempDir%ServerBackup_%datetime%.7z >> %LogDir%BackupLog%datetime%.log
rmdir /S /Q %TempDir%ServerBackup_%datetime% >> %LogDir%BackupLog%datetime%.log

:: pause

::-----------------------------------------------------------------------------
:: Done
::-----------------------------------------------------------------------------

echo. >> %LogDir%BackupLog%datetime%.log
echo %date% %time%: Backup process completed >> %LogDir%BackupLog%datetime%.log

 

Thanks again,

 

Taylor

 

KevinP
Alteryx Alumni (Retired)

@Coxta45

 

To answer your questions:

 

The SystemAlias.xml and SystemConnections.xml may not be present in all environments. These files are used to store Database connection aliases/information for standard and In-DB connections. If your server doesn't have any saved database aliases or In-DB connection then these files will not be present. As an additional note the RuntimeSettings.xml file you want to backup should always be in C:\ProgramData\Alteryx (%ALLUSERSPROFILE%\Alteryx), The version you found on your E drive is likely the default file included with the application installation. You don't need to backup that copy and you shouldn't make edits to it. All changes from the default configuration should be stored in the copy found in ProgramData.

 

If you are using the emongodump option of the AlteryxService to perform the backup the backup data will including the following items:

 

  • admin database
  • AlteryxGallery database
  • AlteryxGallery_Lucene database
  • AlteryxService database
  • ASCredentials.bin file
  • ASMongoDBVersion.bin file
  • mongocontroller.log file
  • mongoDump.log file

These list includes all items needed to restore the database as needed. This is why we recommend performing the backup via our service when possible. 

 

In regard to the batch script I am glad you found it useful, and thank your for sharing your edits with the community. 

Coxta45
11 - Bolide

@KevinP - Once again, thanks!  

 

While making the change to copy the correct RuntimeSettings.xml file, I was also able to locate our SystemAlias.xml and SystemConnections.xml files (and begin copying them in the batch script).

 

Location references:

copy "C:\ProgramData\Alteryx\Engine\SystemAlias.xml" "%TempDir%ServerBackup_%datetime%\SystemAlias.xml" >> %LogDir%BackupLog%datetime%.log
copy "C:\ProgramData\Alteryx\Engine\SystemConnections.xml" "%TempDir%ServerBackup_%datetime%\SystemConnections.xml" >> %LogDir%BackupLog%datetime%.log
MarqueeCrew
20 - Arcturus
20 - Arcturus

Would you happen to have a powershell script? 

 

I see the Pause commands in here and wonder if that then requires operator involvement in this operation or what happens?  Ideally, this is auto-magic.

 

I also didn't see the copying of runtimesettings and systemalias nad systemconnections xml in this procedure.  Should it be included as a best practice?

 

Are there any other updates to include?

 

Cheers,

Mark

KevinP
Alteryx Alumni (Retired)

@MarqueeCrew Sorry, I don't have a powershell version of this script. However, my original script also doesn't have any 'pause' functionality and should work without any user intervention as long as any needed tweaks are made so the script fits your environment. The modified version of my script @Coxta45 posted does have lines with 'pause' listed but these are comments and wouldn't have any impact on the code. I assume he was using them to debug the script for his particular use case and just commented them out once he finished.

 

In regard to backing up the RuntimeSettings, SystemAlias, and SystemConnections xml my script does backup all of these on lines 53,54, & 55. The code does use a hard path though and it would probably be better to use the environmental variable instead just in case the ProgramData location has been changed. Something like the snip below would be technically more accurate.

 

copy %ProgramData%\Alteryx\RuntimeSettings.xml %TempDir%ServerBackup_%datetime%\RuntimeSettings.xml
copy %ProgramData%\Alteryx\Engine\SystemAlias.xml %TempDir%ServerBackup_%datetime%\SystemAlias.xml
copy %ProgramData%\Alteryx\Engine\SystemConnections.xml %TempDir%ServerBackup_%datetime%\SystemConnections.xml

 

 You could also take this same concept and set variable for all of the hard coded paths in the script. Such as the paths for the service, networks locations, and 7-zip. The you could just update the variables with the correct paths for your environment. Maybe I will do this if I have some spare time in the near future and post an updated version.

r4upadhye
11 - Bolide
I wanted to create a manual backup of mongodb, but somehow it didn't work can you suggest me on this thread: https://community.alteryx.com/t5/Alteryx-Server-Discussions/MongoDb-backup/m-p/198728#M2106 thanks, rahul
Derangedvisions
11 - Bolide

Has anyone had any issues with this? We were running it for several months - however the last month or so we kept having an issue with the Alteryx Service not restarting after the back up.  The error in the log files was "no suitable servers found", which sounds like it's having an issue with the MongoDB? I've had to disable the back up for now- but would love to find out what we did wrong(or if anyone else has come across the same issue) (have to thank Alteryx Support (thanks Peter!) for helping me figure out why our service was stopping in the first place!)

 

sgabriel62
7 - Meteor

Working with a colleague - we have converted the batch file listed within this thread into a power shell script.  Its completely possible to perform the conversion.

We are running tests against each of our environments and of  course with each run we come up with new ideas to add to it.

@DerangedVisions -  what changes  have you made to your environment that presented your error?  Is this error in conjunction with your backup script?  Manual backups for us work as well.   In regards to your service  not restarting -  is your mongodb.lock file in your Persistence path 0k when the service is stopped?  if its 1k or higher, you need to move it out of the way and recreate the empty lock file so its 0k in size - then restart your service.

 

Derangedvisions
11 - Bolide

@sgabriel62 i'm not aware of any changes to the environment. The Batch file would run 98% of the time, and randomly it wasn't able to restart the service ("no suitable servers found" ) prior to the batch file issue- we did have some locks that were removed with the help of the Alteryx support team right before the issues with the back up, but I haven't checked since.

 

 

STison
5 - Atom

Hi all

We are having the same issue.

The service does not start again after the backup.

The backup completes and looks fine but the service start fails.

Even if we run it from the command line.

Have to kill mangod.lock then it starts.

But is not 100%. Have to stop it.

Then kill all alteryx process in Task manager.

Then start again - all good.

KevinP
Alteryx Alumni (Retired)

I am currently aware of two scenarios surround this process and my sample script that can cause issues with the backup and the service stop/start processes.

 

The first scenario effects scripts based on versions of the sample prior to 1.5 or if your Alteryx Server is version 2018.3. In these scenarios any delay in the service shutdown (typically due to jobs running at the time the backup is started) will cause the backup script to proceed with the backup while the service is still running. This causes the service start commands to fail because the service is still technically running even though it is in the process of stopping. Eventually the service will stop leaving the server in an offline state. If you are not running Alteryx Server 2018.3 version 1.5 or higher of my sample script should prevent this with the changes to the service start and stop commands. If you are on Alteryx Server 2018.3 you will need to schedule your backups during a period where no jobs are running, or monitor the process manually to ensure the service stops and restarts properly. I am also working on an updated script that should address this through better handling of the services state changes, and I will update the article with the new script as soon as it has been completed and tested.

 

We have also seen some issues with 2018.x running on 2008 R2 server where the mongo process doesn't properly indicate to the service that it has exited. The issue occurs inconsistently, and we haven't been able to identify the exact scenario or cause yet. If you find that the backup process hangs during the backup (the script will continue updating the screen with .... forever) you are likely encountering this behavior. This behavior can also affect service state changes after the service has been running for a few days. Unfortunately, I can't correct this behavior with the sample script as it seems to be an interaction between our service and the mongod process in certain environments. If you encounter this issue I would recommend upgrading to the latest version of Alteryx Server and ensuring you have all available windows patches installed. If the issue still persists you may want to consider upgrading to Windows Server 2012 or 2016.

sgabriel62
7 - Meteor

We have converted your batch script into Powershell.   We've added a condition statement to forcefully stop the service after a sleep period has been met.  Meaning it will wait for the lock file status to change from 1k to 0k on its own.  Once executed we force the creation of an empty lock file prior to restart of service so that once the dumps are completed the service will start without fail.  But there will be the occasion where this cant be helped because of a hung session or Windows is mis-behaving.   Advantage version 2018.x - Workflow caching - where the workflow rerun continues where it left off.

 

KevinP
Alteryx Alumni (Retired)

Version 2.0 of the example backup script is available now, and the article has been updated to reflect the new version. This version handles service start and stop request in a more dynamic and graceful manner. Please note that if the timeout value is exceed you may encounter a scenario where the service eventually shuts down and isn't restarted. This would be expected depending on where the timeout occurred and the scripts log would reflect that service stop or start failed. I have defaulted the timeout to 2 hours to try and give the service ample time to respond, but this may not be sufficient for some environments. If you find you are frequently encountering the service timeout try increasing the timeout value.

KevinP
Alteryx Alumni (Retired)

Updated example script to version 2.0.2. This version only includes some logging improvements. Specifically more messages are timestamped, spacing has been improved, and time stamps now include timezone information.

Toby
5 - Atom

@, why not post your Powershell script to help others?

sgabriel62
7 - Meteor

OK Folks -  Here is a generic breakdown of the Powershell script.   What the script does is:  Stops Alteryx service, performs the dump, restarts the service so your users can get back to work.  Then as a background process, we copy the dump location from the local server to the NAS for DR replication.  Do a bit of cleanup and its done.

This will get you started.  Obviously my Production version works so I couldnt share with you but - as an example -  a 13GB DB will take approx 3 min to dump.   Vs.  a heavy content 63GB takes nearly and hr.   On average its 30 minutes but again its all down to the content of the DB.  

This same script can be used to perform restores if you are on a regular maintenance schedule - but I manually perform my restores from the command line for more control.

Any questions feel free to ask:   Hope this helps and good luck

--------------------------------------------------------------------------------------------------------------------

# Mongod backup Script
# stop mongod.exe, backup the db, make local copy of files, start mongod.exe, create network archive of files.

#Error color set to yellow, to cause less panic.
$host.PrivateData.ErrorForegroundColor = 'Yellow'

[cmdletbinding()]

$totalScriptTime = Measure-Command {
# Set log location
#$log = "\\SAN/NAS name\alteryx_prod0001\MongodbBackups\dbbackup.csv"
$log = "E:\Temp\dbbackup.csv"
# set local backup location
$dest = "E:\temp\Mongo_backup"
[System.Collections.ArrayList]$source
#Set network backup location
$archive = "\\SAN/NAS name\alteryx_prod0001\MongodbBackups"
# List of files to backup
#$persist = "E:\ProgramData\Alteryx\Service\Persistence\"
$filelist = @( # Add or remove files
"E:\temp"
"C:\ProgramData\Alteryx\RuntimeSettings.xml"
)

$lockfile = "E:\programdata\alteryx\service\persistence\MongoDB\Mongod.lock"

#$filelistpath = ""
#$filelist = import-csv "$filelistpath"

# $Source is the variable for compression and archiving.
[System.Collections.ArrayList]$Source = @( # Add or remove files
"C:\ProgramData\Alteryx\RuntimeSettings.xml"
)
#Format for adding additional directories.
# $Source.Add("C:\users\cstone42\desktop\gifs")

# Test if the local backup location exists, if not, create it
if (!(test-path $dest))
{
Write-Output "$dest does not exist, creating now"
New-Item -ItemType directory -Path $dest
}

if (!(test-path $log))
{
Write-Output "$log does not exist, creating now"
New-Item -ItemType directory -Path $log
}

if (!(test-path $archive))
{
Write-Output "$archive does not exist, creating now"
New-Item -ItemType directory -Path $archive
}

# Function for logging
function SendOutput()
{
Param(
[string[]]$output
)

$now = get-date -format "MM/dd/yyyy HH:mm:ss"
# Build a hashtable of the collected data + computername (because thats important)
$Properties = @{ComputerName = $env:COMPUTERNAME
now = $now | Out-String
output = $output | Out-String
}

# importing hashtable to properties of new windows object
$obj = New-Object -TypeName PSObject -Property $Properties
Write-Output $obj | Select-Object -Property now,ComputerName,output | Export-Csv -Path $log -NoTypeInformation -Append
Write-Output $obj | Select-Object -Property output | convertTo-csv | write-host -ForegroundColor Magenta
}

# write to log
$output ='Beginning backup script';SendOutput -output $output
#
# Stop mongod.exe
Write-Warning "Stopping AlteryxService.exe"
Set-Service -ServiceName AlteryxService -StartupType Disabled
Stop-Service AlteryxService | Out-Null
#start-sleep 780

$processname = "AlteryxService"
# establish a loop counter
$i = 0
# Get info about the process
$p = Get-process -Name "$processname" -ErrorAction SilentlyContinue # Pick a process name
# If there is no running process, say so.
if (!($p)){Write-Host "Process does not exist, These are not the droids your looking for. move along. Move along!" -ForegroundColor Green}
# Show me the PID so I know your telling the truth and the script is still moving along.
$p.Id
# In the name of honest transparency, let people know.
If ($p){write-host "$processname is alive, ITS ALIVE!!" -ForegroundColor Yellow}
# We can only kill something if we are in emminent danger:
While ($p) # As long as the above named process exists,
# ... ITS COMING RIGHT FOR US!! KILL IT!!!
{
# Check to see if the process is in ".HasExcited status from Get-Process, if so, break
if ($p.HasExited)
{Write-host "Process $p has exited" -ForegroundColor Green ;Break}
# Give it some time
start-sleep 30
# Update the Process info
$p = Get-process -Name "$processname"
# Set to a number above the forced shutdown, to catch any runaway loops.
if($i -ge 22)
{break}
# Set to the max number of attempts before using a forced stop.
if ($i -ge 20)
{Stop-process -force $p}
# Output the number of loops we are at, so people know the script is still running
write-host "$i"; $i++
# Politely ask the process to stop and continue to check on it via the loop.
Stop-process $p
} # End of Loop

#$output = "Result from stopping Alteryx: $lastexitcode";SendOutput -output $output
start-sleep 2

Remove-Item "$lockfile" -Force -ErrorAction SilentlyContinue
if (!(test-path $lockfile))
{
New-Item $lockfile
}
else {
$output ='Its not you, its me ... No, its you. I give up, you are on your own. Im leaving you. Good-Bye';SendOutput -output $output
Write-Warning "There is a problem with the lock file, Help me Obi-Wan Kenobi, you're my only hope"
Start-Sleep 2
Write-Warning "The Vogon destructor fleet is in position. So long and thanks for all the fish"
Break
}
<#
$isASRunning = get-service -Name AlteryxService
if ($isASRunning.Status -ne 'Stopped')
{
$output ='AlteryxService did not stop normally, using force and removing lock file.';SendOutput -output $output
Write-Warning "$output"
Stop-Service -Name AlteryxService -force
Remove-Item "$lockfile" -Force
if (!(test-path $lockfile))
{
New-Item $lockfile
}
else {
$output ='Its not you, its me ... No, its you. I give up, you are on your own. Im leaving you. Good-Bye';SendOutput -output $output
Write-Warning "There is a problem with the lock file, Help me Obi-Wan Kenobi, you're my only hope"
Start-Sleep 2
Write-Warning "The Vogon destructor fleet is in position. So long and thanks for all the fish"
Break
}
}#>

# more logging
$output = "Starting Mongo db backup";SendOutput -output $output

# Creating the dump
$dumpname = "MongodbDump_$(Get-Date -f yyyy-MM-dd-HH-mm)"
#$dump = & "E:\Program Files\Alteryx\bin\AlteryxService.exe" emongodump=E:\temp\mongod_dmp -Wait
$dump = & "E:\Program Files\Alteryx\bin\AlteryxService.exe" emongodump=E:\temp\$dumpname -Wait
$result = $dump.ExitCode
if ($result -eq "False")
{
$output = "Mongo backup failed during dump"
SendOutput -output $output
break
}
else
{
$output = "Mongo dump completed"
SendOutput -output $output
$Source.Add("E:\temp\$dumpname")
}

# Restarting the alteryx service
Set-Service -ServiceName AlteryxService -StartupType Automatic
$output = "Starting Alteryx Service";SendOutput -output $output
Start-Service AlteryxService | Out-Null
$output = "Starting Alteryx Service exit code: $lastexitcode";SendOutput -output $output
#>

###
#
#write-warning "waiting for files to copy"
#get-childitem $dumpname -Recurse | Copy-Item -Destination $dest\$dumpname | out-null
#start-sleep 1
#
##
####
#########
#####

start-sleep 5


# copy and hash check to check data integrety
foreach ($file in $filelist)
{
$SourceFile = $file
#$SimpleName = [System.IO.Path]::GetFileName("$SourceFile")
Get-ChildItem $SourceFile | Copy-item -Recurse -Destination $archive

if((Get-FileHash $SourceFile).hash -ne (Get-FileHash $archive).hash)

{
#sloppy messages ... need to clean it up
$output = "Copy to $archive Failed - $file is different"; SendOutput -output $output
}
Else
{
$output = "Copy to $archive sucessful - both copies of $file are the same"; SendOutput -output $output
}
}
#$Source.Add("$dest")

 

# compress from local temp to network location for long term storage and log.
#write-host "Beginning Archive process" -ForegroundColor Cyan
<#
# File Compression.

$TotalCompressTime = measure-command {
<# Source Information should be located above.
[System.Collections.ArrayList]$Source = @( # Add or remove files
"C:\users\cstone42\desktop\desktop"
)
$Source.Add("C:\users\cstone42\desktop\gifs")
#

$bun = 0
foreach ($thing in $Source)
{
$bun++
$destination = "$archive\Backup$bun.zip"
$arctime = Measure-Command {
If(Test-path $destination){Remove-item $destination}
$compressionLevel = [System.IO.Compression.CompressionLevel]::NoCompression
Add-Type -assembly "system.io.compression.filesystem"
[System.IO.Compression.ZipFile]::CreateFromDirectory($Thing, $destination, $compressionLevel, $true)
} #End Measure-command
$ThisMany = $arctime.Seconds
$output = "$thing took $ThisMany seconds to compress."; SendOutput -output $output
}
} # End of Measure-Command
$TCTS = $TotalCompressTime.Seconds
$output = "Total compression time: $TCTS"; SendOutput -output $output

#get-childitem $dest -Recurse | Copy-Item -Destination $archive -force | Out-Null
$output = $?
$output = 'network archive complete '; SendOutput -output $output
#>

#housekeeping
$Now = Get-Date
#define amount of days
$Days = "1"
#folder where files are located
$TargetFolder = $archive
#define extension
$Extension = "*"
#LastWriteTime parameter based on $Days
$LastWrite = $Now.AddDays(-$Days)

#get files based on lastwrite filter and specified folder
$Files = Get-Childitem "e:\temp\*.*" -Recurse | Where {$_.LastWriteTime -lt "$LastWrite"}


if ($Files -ne $NULL)
{
foreach ($file in $Files)
{
$output = "Removing file $file"; SendOutput -output $output
Remove-Item $file.FullName -force #-WhatIf
}
} # End if

} #End Of Measure-Command

$output = 'Ending Backup Script'; SendOutput -output $output
$output = "Total mongo db backup time: $totalScriptTime"; SendOutput -output $output

# EOF

Toby
5 - Atom

@,

AWESOME! Thanks for posting this :)

dwalker3rd
6 - Meteoroid

for those of you forcing the AlteryxService to stop ...

(because you can't always wait four hours for a workflow to complete)

 

if the AtleryxService is not stopping and you're running a worker on the same machine as the controller (and thus embedded mongodb), it's most likely because there are jobs running.  the service will not run down until those jobs complete.  so if you're going to kill anything, you should kill all the instances of AlteryxEngineCmd.exe.  The value of "Workflows allowed to run simultaneously" in System Settings > Worker > General determines the number of instances of AlteryxEngineCmd.exe.  however, be forewarned.  This is not without consequences as well.  you will be terminating running jobs and, occasionally, i've seen the schedules for those jobs be disabled as a result.

dwalker3rd
6 - Meteoroid

what now also concerns me is executing a backup of the embedded mongo instance across our alteryx server environ which consists of three machines:  an 8-core runs the controller, the gallery and a worker; two 4-cores run workers only.  i've not worried about the remote workers when backing up mongo.  i've only stopped AlteryxService on the 8-core.  but recently, i've received conflicting information from Alteryx support regarding a clean shutdown.  i've heard both that shutting down the workers doesn't matter.  and i've heard that shutting them down *first* is critical to having a "clean" shutdown.  i assume this means that doing otherwise risks corrupting the content of the embedded mongodb instance.  (which is it?)

 

if shutting down the service on ALL the node in your alteryx server environ is a requirement, then an automated backup just got more difficult.  and, again - if this is true, AND there's no way to backup mongo while the service is running, then mongo is not a good choice.  b/c in addition to the complexity of having to stop the service on EVERY node, if you are multi-node, it's probably b/c your load is significant and you probably have no daily windows for a backup (and a weekly one is probably a challenge - is for me).

 

i know alteryx server is going to postgres soon.  that's a good thing.  but in the meantime, what's a alteryx server admin with multiple nodes and a workflow schedule that's booked solid all day to do?  what's alteryx's suggestion @KevinP ?

jason_scarlett
10 - Fireball

FYI,

We added this to get rid of logs/zip files older than 14 days.

The PushD/PopD temporarily maps a network drive.

 

PushD "%LogDir%" &&(forfiles -s -m *.log -d -14 -c "cmd /c del /q @path") & PopD
PushD "%NetworkDir%" &&(forfiles -s -m *.7z -d -14 -c "cmd /c del /q @path") & PopD

jason_scarlett
10 - Fireball

Another addition.

We had some jobs running overnight when the backup was triggered. This caused the gallery to go down. Adding this prevented the issue.

Basically it checks to see if there is an active job running, and if so skips the backup. We are okay missing a backup once in a while.

 

 

:: if a job is running (i.e. AlteryxEngineCmd.exe task exists) then abort backup
tasklist | FIND "AlteryxEngineCmd.exe"
IF errorlevel 0 IF NOT errorlevel 1 GOTO SystemError

 

 

... as suggested by @DanC on this post: https://community.alteryx.com/t5/Alteryx-Server-Knowledge-Base/Alteryx-Service-Stuck-in-Stopping-Sta...

 

Inactive User
Not applicable

@KevinP Can a restore be done on another gallery/environment without changing any configuration? Just restoring content? For example, syncing a UAT environment from a Production one or vice versa.

KevinP
Alteryx Alumni (Retired)

@Inactive User These instructions and example script are intended for a single node environment where the backup and restore are to the same server. If you need to restore to a new/different server please reach out to support and we can assist you as there are a number of potential issues that have to be accounted for.

nishant_saxena
7 - Meteor

Hi All , We have distributed environment with workers on separate servers and (Controller+Gallery) reside on 1 server . For taking server backup, do we need to stop Alteryx Service on worker too, OR just we can stop services on Controller node and take mongoDB backup.  

lepome
Alteryx Alumni (Retired)

@nishant_saxena 
The article has been updated to answer your question.  tl;dr  YES

mchamps
7 - Meteor

@lepome@KevinP 

 

I noticed the initial batch script in the first post has ASCII encoded characters for percentages. Probably some issue with the community forum markdown. This caused the script to fail.


Find:

%25

 

Replace with:

%

 

Please update the script at your convenience. I am using the latest version of Firefox.

 

I highlighted some examples of the issue:

 

1.png

 

r4upadhye
11 - Bolide

thanks for the update,

Aguisande
15 - Aurora
15 - Aurora

Hi @KevinP 

Very nice article.

Is there any chance this script/batch gets on GitHub, so we all can have access to it and it's updates?

I think this will help a lot of people dealing with server backups.

Thanks in advance!

seinchyiwoo
Alteryx Alumni (Retired)

@Aguisande good suggestion. @KevinP It will be nice to have these script on on GitHub as I'm also hearing increasing requests from my users 🙂

yuriy
8 - Asteroid

@Coxta45 , thank you for the comment about the quotes for copy command. I wish I read the comments before pulling my hair off why is it not working...

@KevinP  - could you please update the article to include quotes for copy command?

 

Another thing I added is creating folders for backup locations if they do not exist.

`if not exist "%LogDir%" mkdir %LogDir%`

`if not exist "%NetworkDir%" mkdir %NetworkDir%`

 

Thank you

 

KevinP
Alteryx Alumni (Retired)

@Aguisande and @seinchyiwoo Thanks for the suggestion. Since I am no longer part of the support team I haven't been actively maintain this script like I used to. However, I think it is highly valuable to have this in a public git repo. I don't use GitHub though so I created a repo in GitLab instead. I did try to maintain the version history as appropriate for the script (based on the older versions I still had copies of). You can find the repo at:

https://gitlab.com/kpowney/alteryxserverbackupscript

@yuriy You shouldn't need to quote the paths for the copy commands... unless maybe on of your variables resolves to a path containing spaces or a character that needs escaping... The folder check is a good idea I may add something along this lines to ensure all referenced paths exist and create them or exit with an error as appropriate.

Aguisande
15 - Aurora
15 - Aurora

Awesome and Thanks @KevinP !

I'll take a look to it.

Thanks again

yuriy
8 - Asteroid

 @KevinP Thank you for the comment and getting the script to git repo. I do not however have any spaces. For the special characters, I have \:._ which I think should be acceptable

smysnbrg
8 - Asteroid

I just read the spoiler for Multi-Node environments.  Does anyone have any suggestions on where to start on coordinating the backups and service restarts on multiple servers?

ssidhpura05
6 - Meteoroid

I used this script in our newly built environment and it worked as expected. Thank you @KevinP   and all for this. I'm trying to include SMTP settings in the bat file so I can receive notification once the script is completed however unable to make it work. Can someone suggest or share the code snippet on how can I append SMTP / email notification in the existing bat file? 

KevinP
Alteryx Alumni (Retired)

@ssidhpura05 There are a multiple ways you can add email notifications to this backup script. However, all of them require an external application to assist in sending the email and the best option will likely depend on your particular email server's requirements and restrictions. 

The three core methods I would try would be as follows:

 

  1. Use 'telnet' to connect to the smtp server and send the email.
    1. There is an article here on community called Email SMTP Troubleshooting that goes over how to troubleshoot email issue using telnet which explains the commands. These would translate pretty directly to a batch script.
    2. There are also plenty of examples online of using this method and the PowerShell method up next.
  2. Use a PowerShell script
    1. This requires a separate email script using something like the Send-MailMessage commandlet in PowerShell and then calling that script from within the backup batch file. 
    2. There is a decent example of this in the comments of this stack overflow post
  3. Use a CLI mail application such as CMail, mailsend, SendEmail, etc...
    1. This is probably the most flexible option as the mail application can provide extended capabilities over using telnet or PowerShell depending on the specific application chosen and your needs
    2. Implementation will be specific to the application in question though and finding examples to help may be a challenge
jzacharuk
5 - Atom

Are there any options to take a backup without stopping the Alteryx Service?

 

I assume that stopping the Alteryx Service means that none of the scheduled jobs will run while it is shutdown. Wouldn't this cause a major headache for critical jobs and as the usage of scheduled jobs becomes more and more frequent?

RE5260
8 - Asteroid

As suggested I've created the batch file by adding 3 lines.

 

RE5260_0-1677146841319.png

When I'm trying to run the Batch file, it's throwing error.

 

RE5260_1-1677146901621.png

 

Please help.

 

Josue_Venancio
7 - Meteor
 

 Help with backup and restore and with exams. Great community support material.