Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
carlosteixeira
15 - Aurora
15 - Aurora

Scenario

LAST UPDATE - 2024-05-24 - Hi, as you know some tools are deprecated by Alteryx, and we have new product versions since I created this process. Two years after I did it, I managed to make it work in the new version 2023.2.1.51 Patch 1 - This could work on your side, but you will probably need to make some changes in the workflow to reflect some aspects of your environment. Feel free to reach out to me using a direct message, or putting your question on this post or, you can find me on the Linkedin

 

Alteryx Server has functionality that we can configure in Alteryx System Settings that cancels jobs that have been running for a certain time limit.

 

carlosteixeira_0-1644606965770.png

 

Above, we can define, in seconds, the maximum amount of time allowed for a workflow to be executed. For the environment we were working on, this limit was 3600 seconds (1 hour).

 

Problem

 

In this scenario, any workflow that was scheduled and executed for a time longer than the configured 3600 seconds would be automatically canceled by the system.

 

It turns out that this setting only works for workflows that are scheduled in the Gallery. If the user runs the workflow manually through the Gallery, this configuration does not work.

 

The user put the workflow to run manually and left it running for hours.

 

In this scenario, in large environments, with many users doing this we have the problem of the execution queue. Many workflows running at the same time in the Gallery, manually generating a queue of more than 20 workflows waiting to be executed, even though they are scheduled.

 

For us environment administrators this is almost impossible to manage because we don't know which workflows can be canceled or not.

 

carlosteixeira_1-1644607055377.png

Solution

 

When we realized that this was impacting our environment, the client asked us to study a possible solution. With the help of my friends @Thableaus and @marcusblackhill we managed to arrive at the solution below.

 

We created a workflow that monitors what is running on the Alteryx Server and if it identifies a job that has been running for more than a certain time, regardless of whether it is scheduled or manual, it cancels the job via a Download tool call.

 

Cancel_Manual_Job_1.png

 

Below I will explain a little bit of the flow that is divided into 4 parts.

 

Part 1

 

In this first part, we went to MongoDB to identify which sessions we had active within the database, and using Filters/Summarize/Joins we got only 1 active session to be used as a credential at the end of the 4 processes.

 

Along with this section, I also include the Gallery URL.

 

Cancel_Manual_Job_2.png

 

Part 2

 

In the second part of the flow, we also went directly to the base of MongoDB to identify which workflows are running at that moment.

 

In the example below we can identify 1 workflow being executed.

 

Using filters and formulas, we were able to calculate how long this workflow has been running. The first filter identifies if the workflow is in “running” status, goes through the formula tool, and calculates the execution time of this workflow. In the second filter, we check if the workflow is running longer than the stipulated time.

 

In the third filter, and here it was also a request made by the client, we included the workflows that can evade the cancellation rule. For that, in this filter, we include the names of the workflows that enter in the exception cases.

 

Cancel_Manual_Job_3.png

 

Part 3

 

At this stage, also going to MongoDB, we were able to identify who is the user responsible for that flow (owner) and create alerts that are sent to administrators and to the user himself/herself that the workflow executed was canceled because it was running longer than the stipulated time, per company guidelines.

 

Cancel_Manual_Job_4.png

Part 4

 

In this last step, we create the variables necessary for the system so that we can create an HTTP call and send the necessary credentials for the command to be executed by the system as if we were canceling the workflow manually.

 

So when a workflow is identified with a longer execution time than that stipulated by the company, it will be canceled.

 

Cancel_Manual_Job_5.png

 

Implementation

 

For this workflow to work, we implemented it as follows.

 

Inside the Server (Alteryx Server) we include the workflow in a way scheduled by the Alteryx Designer to run every 5 minutes. Note that the execution time of the complete workflow is very fast, about less than a minute.

 

carlosteixeira_7-1644607598470.png

 

carlosteixeira_8-1644607611116.png

 

In this way, the workflow does not consume one of the execution slots that is configured in Alteryx System Settings with the number of workflows that can be executed simultaneously.

 

I'll leave the workflow attached in case any of you want to implement or study and even improve the process in your environment.

 

Tip: in the environment where this was implemented, I even included some alerts to anticipate the information for the administrators, for example, receiving an email if the workflow reaches 90% of the pre-set time. This way, the administrator can have an early idea of which workflow(s) will be canceled.

 

Another tip, which I used in the environment in question is that we disable the configuration in Alteryx System Settings to cancel the jobs there.

 

Until this feature is implemented in new versions of the Alteryx Server so that regardless of how the workflow is started it falls into this cancellation rule, this is a solution that has served the customer well and can be implemented quickly.

 

Some processes were implemented by the client's team in order to have better governance of the environment, such as requesting the inclusion of workflows in exceptions so that they can run at any time regardless of the stipulated time.

 

With this workflow process of great relevance to the business, they were included in this list of exceptions, and we also evaluated together with the user the best time for it to be scheduled or executed.

 

That's it. If you have any questions or curiosity about this workflow and how to implement it feel free to contact me, I'll be happy to help.

 

Cheers

 

Carlos A Teixeira - Alteryx ACE

Comments
MattBenj
9 - Comet

Thanks for sharing @carlosteixeira! I like the added touch of sending the user an email alerting them of the cancelation. You mentioned:

Until this feature is implemented in new versions of the Alteryx Server so that regardless of how the workflow is started it falls into this cancellation rule, this is a solution that has served the customer well and can be implemented quickly.


Do you know if this enhancement is on the roadmap for future versions?

 

Thanks!

carlosteixeira
15 - Aurora
15 - Aurora

Hey @MattBenj  how are you?
Look, what I know is that there are some requests about including this cancellation for both scheduled and manual, but I don't know when and if it will be implemented in the next versions.


We hope so.


Here are some links with this idea requested by some users:

 

https://community.alteryx.com/t5/Alteryx-Server-Ideas/Configuration-to-set-the-Timeout-of-a-manual-r...

 

Thanks

fmvizcaino
17 - Castor
17 - Castor

Great content @carlosteixeira ! Very helpful

NicolasSz
11 - Bolide

Thanks a lot for your solution. This will I guess the trick until Alteryx supports it.

This is quite a basic feature for a server product ! 

Ariharan
11 - Bolide

Hi Carlo,

 

Thanks a lot for your idea. In our case completion time is not updated for running/queued jobs. We tried to use the Alteryx server usage report data parser (As Result & As queue) and we found that it does not contain execution start time and queue time values for running/queued jobs. When jobs are running/queued, it is captured as null. 

In the example given, there are four scheduled jobs running in the server at the moment and one manual job is being triggered by the user and is in a queued state for more than 30 minutes to start running. However, in MongoDB, when a job is triggered that time is captured as the start time (not considering the queue time). If we consider the start time of As_Queue in our workflow, we ended up with a mess. Since that manual job is queued for 30 minutes and running for only 3.30 hours, it is being killed by our workflow. It should only be killed after 4 hours.  

 

  • How did we determine the total queue time & the execution start time for the running/queued jobs? 
  • How do I kill this job automatically after four hours while taking queue time into consideration?

Thanks & Regards, 

Ariharan.R

Ross_K
7 - Meteor

Hi @carlosteixeira,

 

Love the solution. One issue we encountered is that if all of the nodes are bogged down, this workflow cannot run :P

So instead we are using Python to monitor the current active workflows by querying the queue on Mongo. We then need to shut down any that exceed the time and send the appropriate notification emails.

Querying Mongo is easy, constructing the url is easy, but we don't seem to be able to authenticate to cancel it.

For the normal API endpoints we use the api keys, generate an OAuth token and use that. From your workflow it looks like you are using basic auth with a username and password?

We have tried this basic auth api keys and with username/password login credentials both for the user who owns the workflow and for admin, the result is always the same: 401 error, {"data":null,"exceptionName":"UnauthorizedException","innerExceptionMessage":"","message":"User is not authorized to perform operation."}.

Any suggestions on which credentials we should be using here or any specific permissions they need?

 

Thanks :)

Arti2893
7 - Meteor

Can you please explain what parameters are being passed to download tool in detail

 
 
 

cancel.PNG

Andrej_k
6 - Meteoroid

Hello,

 

great approach.

 

Good news that with minor adjustments its working on 2023.1 server version.

 

We are using it to cancel also scheduled workflows that are running longer than expected (eg. issue with database). Weak moment is related to sessions input. Found out that active sessions from sessions table does not mean that session is actually active. It just tracks records of all sessions that were active. So if all sessions on gallery side are expired, we will still have records about active session, upon sending http request to delete job, it will respond that we have "invalid_session" and obviously nothing will happen. So out of working hours this solution does not work.

 

Did anyone found workaround?

 

Thinking about 2 options: open new session upon need (not sure how to do it though); or prolong session (do some fake activity).

Andrej_k
6 - Meteoroid

Just to post update:

implemented first approach - if there are any jobs to be cancelled will enable control container with python script that authenticates to gallery (requests, requests_ntlm libraries) and returns valid session_id, which is being used further in download tool to cancel job. 

Still consider that such API endpoint is must have for server... 

Hi Andrej,

 

Thank you for exploring it. Is your download tool working to cancel the job. This flow was working for me without errors, but recently the tool is not cancelling the job. I am checking API Documentation in Alteryx Server Swagger but can't find an endpoint to cancel the job. Do you know if they don't support it anymore?

 

I appreciate your help in advance.

 

Thank you

 

Andrej_k
6 - Meteoroid

Hi Arman,

 

such endpoint does not exist in API so swagger won't help. We are sending http delete request to "imitate" as someone has manually pressed button "Cancel" in Gallery via browser.

 

First what is coming to my mind - check response from download tool (if http request was not correct server will respond somehow). Eg. it could be invalid_session as was in my case during non working hours;

If some updates were installed on server side, there might be different headers in http request. Then I suggest you to check what headers are being sent once you press on Cancel button (dev tools in browser, network tab) and update text input with those.

 

Hope it will be of any help :) 

 

Thank you for your response Andrej. I checked and there were some changes in the server side. I changed the parameters accordingly but in my case I see 401 Unauthorized error. I am using username and password for authentication. 

 

I would appreciate if you have any ideas and thank you for trying to help.

 

Thanks again

fmvizcaino
17 - Castor
17 - Castor

Hi @Andrej_k ,

 

I totally understand if you can`t, but since you have adapted the solution from the great @carlosteixeira , it would be super nice if you could share your solution with everyone else here. 😁

 

I`m not asking for myself, just for the benefit of everyone else who may need this fantastic feature (which should be native, in my opinion 🙄)

Andrej_k
6 - Meteoroid

Hi all,

 

@fmvizcaino unfortunately here are some restrictions and I am unable to attach workflow, only screen shots and code snippet. So python script is the following:

[1]:

# List all non-standard packages to be imported by your 
# script here (only missing packages will be installed)
from ayx import Package
Package.installPackages(['requests','requests_ntlm'])

 

[2]

from ayx import Alteryx
import pandas as pd
import requests
from requests_ntlm import HttpNtlmAuth

def authenticate_and_get_session_df(username, password, url):
    headers = {
        'Content-Type': 'application/json',
        'Accept': 'application/json',
    }
    data = {
        "scheme": "windows",
        "parameters": [{"name": "updateLastLoginDate", "value": True}]
    }

    ntlm_auth = HttpNtlmAuth(username, password)

    try:
        response = requests.post(url, headers=headers, json=data, auth=ntlm_auth, verify=False)
        response.raise_for_status()
        response_data = response.json()
        df = pd.DataFrame(response_data, index=[0])
        return df
    except Exception as e:
        print(f"Authentication failed for URL {url}: {e}")
        return pd.DataFrame()

# Attempt to read credentials from Alteryx, handle exceptions
try:
    credentials_df = Alteryx.read("#1")
except Exception as ex:
    print(f"Error reading credentials from Alteryx: {ex}")
    credentials_df = pd.DataFrame()

if credentials_df.empty:
    print("No credentials found. Exiting.")
    session_df = pd.DataFrame({'sessionId': [None]})
else:
    # Extract values from the single row
    row = credentials_df.iloc[0]
    username = row['username']
    password = row['password']
    url = row['url']

    # Call the authentication function with extracted values
    session_df = authenticate_and_get_session_df(username, password, url)
    if not session_df.empty:
        print(f"Authentication successful for URL {url}!")
        print(session_df)
    else:
        print(f"Authentication failed or session data not retrieved for URL {url}.")
    
#export to first anchor
Alteryx.write(session_df, 1)



values are taken from Browser inspect (can use also Fiddler to track all requests responses). So we need to find step where we update session and get values: username, password, url. those should be sent as input in python script. output is sessionId value. the rest is the same as in original concept. Assume that this approach might be working only for the same authentication type as we have.

 

@parmarar sorry, hard to trouble shoot this part. Maybe worth to contact Alteryx Support. The only idea is to check all columns that are returned from Download tool. For successful cancellation I also have 401 Unauthorized response in download headers, but job is being cancelled. Download data column is empty if all is OK. Alternative option is again to do some workaround with python scripts.

 

carlosteixeira
15 - Aurora
15 - Aurora

Hello everybody. How are you?


After a long period (2 years) I reviewed the flow and updated this post. I included a new workflow with some adjustments and improvements due to new versions and discontinued tools.

 

You will see that in the new workflow, I am using a connection with MONGODB to be made via ODBC and no longer with the MONGO connector.

 

This new workflow is working in version 2023.2.1.51 Patch 1

 

Any questions, I am available.

 

Cheers

Carlos A Teixeira - Alteryx ACE