Engine Works

carlosteixeira · ‎02-18-2022

Scenario

LAST UPDATE - 2024-05-24 - Hi, as you know some tools are deprecated by Alteryx, and we have new product versions since I created this process. Two years after I did it, I managed to make it work in the new version 2023.2.1.51 Patch 1 - This could work on your side, but you will probably need to make some changes in the workflow to reflect some aspects of your environment. Feel free to reach out to me using a direct message, or putting your question on this post or, you can find me on the Linkedin

Alteryx Server has functionality that we can configure in Alteryx System Settings that cancels jobs that have been running for a certain time limit.

Above, we can define, in seconds, the maximum amount of time allowed for a workflow to be executed. For the environment we were working on, this limit was 3600 seconds (1 hour).

Problem

In this scenario, any workflow that was scheduled and executed for a time longer than the configured 3600 seconds would be automatically canceled by the system.

It turns out that this setting only works for workflows that are scheduled in the Gallery. If the user runs the workflow manually through the Gallery, this configuration does not work.

The user put the workflow to run manually and left it running for hours.

In this scenario, in large environments, with many users doing this we have the problem of the execution queue. Many workflows running at the same time in the Gallery, manually generating a queue of more than 20 workflows waiting to be executed, even though they are scheduled.

For us environment administrators this is almost impossible to manage because we don't know which workflows can be canceled or not.

Solution

When we realized that this was impacting our environment, the client asked us to study a possible solution. With the help of my friends @Thableaus and @marcusblackhill we managed to arrive at the solution below.

We created a workflow that monitors what is running on the Alteryx Server and if it identifies a job that has been running for more than a certain time, regardless of whether it is scheduled or manual, it cancels the job via a Download tool call.

Below I will explain a little bit of the flow that is divided into 4 parts.

Part 1

In this first part, we went to MongoDB to identify which sessions we had active within the database, and using Filters/Summarize/Joins we got only 1 active session to be used as a credential at the end of the 4 processes.

Along with this section, I also include the Gallery URL.

Part 2

In the second part of the flow, we also went directly to the base of MongoDB to identify which workflows are running at that moment.

In the example below we can identify 1 workflow being executed.

Using filters and formulas, we were able to calculate how long this workflow has been running. The first filter identifies if the workflow is in “running” status, goes through the formula tool, and calculates the execution time of this workflow. In the second filter, we check if the workflow is running longer than the stipulated time.

In the third filter, and here it was also a request made by the client, we included the workflows that can evade the cancellation rule. For that, in this filter, we include the names of the workflows that enter in the exception cases.

Part 3

At this stage, also going to MongoDB, we were able to identify who is the user responsible for that flow (owner) and create alerts that are sent to administrators and to the user himself/herself that the workflow executed was canceled because it was running longer than the stipulated time, per company guidelines.

Part 4

In this last step, we create the variables necessary for the system so that we can create an HTTP call and send the necessary credentials for the command to be executed by the system as if we were canceling the workflow manually.

So when a workflow is identified with a longer execution time than that stipulated by the company, it will be canceled.

Implementation

For this workflow to work, we implemented it as follows.

Inside the Server (Alteryx Server) we include the workflow in a way scheduled by the Alteryx Designer to run every 5 minutes. Note that the execution time of the complete workflow is very fast, about less than a minute.

In this way, the workflow does not consume one of the execution slots that is configured in Alteryx System Settings with the number of workflows that can be executed simultaneously.

I'll leave the workflow attached in case any of you want to implement or study and even improve the process in your environment.

Tip: in the environment where this was implemented, I even included some alerts to anticipate the information for the administrators, for example, receiving an email if the workflow reaches 90% of the pre-set time. This way, the administrator can have an early idea of which workflow(s) will be canceled.

Another tip, which I used in the environment in question is that we disable the configuration in Alteryx System Settings to cancel the jobs there.

Until this feature is implemented in new versions of the Alteryx Server so that regardless of how the workflow is started it falls into this cancellation rule, this is a solution that has served the customer well and can be implemented quickly.

Some processes were implemented by the client's team in order to have better governance of the environment, such as requesting the inclusion of workflows in exceptions so that they can run at any time regardless of the stipulated time.

With this workflow process of great relevance to the business, they were included in this list of exceptions, and we also evaluated together with the user the best time for it to be scheduled or executed.

That's it. If you have any questions or curiosity about this workflow and how to implement it feel free to contact me, I'll be happy to help.

Cheers

Carlos A Teixeira - Alteryx ACE

MattBenj · ‎03-04-2022

Thanks for sharing @carlosteixeira! I like the added touch of sending the user an email alerting them of the cancelation. You mentioned:

Until this feature is implemented in new versions of the Alteryx Server so that regardless of how the workflow is started it falls into this cancellation rule, this is a solution that has served the customer well and can be implemented quickly.

Do you know if this enhancement is on the roadmap for future versions?

Thanks!

carlosteixeira · ‎03-07-2022

Hey @MattBenj how are you?
Look, what I know is that there are some requests about including this cancellation for both scheduled and manual, but I don't know when and if it will be implemented in the next versions.

We hope so.

Here are some links with this idea requested by some users:

https://community.alteryx.com/t5/Alteryx-Server-Ideas/Configuration-to-set-the-Timeout-of-a-manual-r...

Thanks

fmvizcaino · ‎04-20-2022

Great content @carlosteixeira ! Very helpful

NicolasSz · ‎04-25-2022

Thanks a lot for your solution. This will I guess the trick until Alteryx supports it.

This is quite a basic feature for a server product !

Ariharan · ‎05-12-2022

Hi Carlo,

Thanks a lot for your idea. In our case completion time is not updated for running/queued jobs. We tried to use the Alteryx server usage report data parser (As Result & As queue) and we found that it does not contain execution start time and queue time values for running/queued jobs. When jobs are running/queued, it is captured as null.

In the example given, there are four scheduled jobs running in the server at the moment and one manual job is being triggered by the user and is in a queued state for more than 30 minutes to start running. However, in MongoDB, when a job is triggered that time is captured as the start time (not considering the queue time). If we consider the start time of As_Queue in our workflow, we ended up with a mess. Since that manual job is queued for 30 minutes and running for only 3.30 hours, it is being killed by our workflow. It should only be killed after 4 hours.

How did we determine the total queue time & the execution start time for the running/queued jobs?
How do I kill this job automatically after four hours while taking queue time into consideration?

Thanks & Regards,

Ariharan.R

Ross_K · ‎05-15-2022

Hi @carlosteixeira,

Love the solution. One issue we encountered is that if all of the nodes are bogged down, this workflow cannot run :P

So instead we are using Python to monitor the current active workflows by querying the queue on Mongo. We then need to shut down any that exceed the time and send the appropriate notification emails.

Querying Mongo is easy, constructing the url is easy, but we don't seem to be able to authenticate to cancel it.

For the normal API endpoints we use the api keys, generate an OAuth token and use that. From your workflow it looks like you are using basic auth with a username and password?

We have tried this basic auth api keys and with username/password login credentials both for the user who owns the workflow and for admin, the result is always the same: 401 error, {"data":null,"exceptionName":"UnauthorizedException","innerExceptionMessage":"","message":"User is not authorized to perform operation."}.

Any suggestions on which credentials we should be using here or any specific permissions they need?

Thanks :)

Ariharan · ‎05-21-2022

Hi @Ross-K,

When all nodes are down, manual jobs do not run in the Gallery right.?

One more suggestion, instead of using a Gallery schedule, you can use a Legacy scheduler (Designer).

Regards,

Ariharan.R

Ross_K · ‎05-21-2022

@Ariharan,

It is not that the nodes are "down", they are simply occupied running workflows. If another workflow is executed (either manually or on schedule), it will be queued. this would include any workflow intended to shut down any long-running workflows :P

This is not an issue with the Gallery scheduler, that works well. There is not need to return to the legacy scheduler.

AndrewSu · ‎06-17-2022

@carlosteixeira , i have a partner asking the following after I highlighted the portion of your solution where you are scheduling via Designer and not server.

"This would then require a PC with a designer license to be online, connected to VPN, 24x7, correct? This isn’t a feasible requirement for us given our security + VPN settings. We’d need to leverage our jump box for this but there were licensing concerns about putting a designer license on a shared box."

I believe their understanding is wrong because this workflow has to be run on Designer on the machine that's hosting the Server correct?

carlosteixeira · ‎06-20-2022

Hi @AndrewSu

Exactly, the understanding is wrong. They just need to configure the workflow directly on the designer installed on the Alteryx server machine. Isn't necessary another machine with the Alteryx designer installed. I was running this workflow directly on the Alteryx Server.

Best Regards

MattBenj · ‎09-01-2022

Is there a specific version of server you must have for this to work?

carlosteixeira · ‎09-01-2022

Hi @MattBenj

Not at first, maybe you need to tweak some things but it should work in any version

Thanks

adarsh2707 · ‎11-09-2022

Hello @carlosteixeira

Whenever I run this from my server, It keeps throwing the error - authentication error: generic server error. Do you know what that indicates

adarsh2707 · ‎11-23-2022

Hi All,

I have a doubt,

my workflow is running till the last step but at the download step it reads the below error

Can someone please help me in resolving this error

arun0721 · ‎01-18-2023

@carlosteixeira Thanks for sharing this workflow. However I always get a 401 during executing the api to cancel jobs, I am running it on the server directly. I see that @Ross_K also encountered this. Any idea or suggestion for resolution would be great.

Thanks

Ross_K · ‎01-20-2023

Hi @arun0721 ,

We ended up implementing a custom solution for workflow timeouts that did not involve having to run a workflow to cancel other workflows. Instead it uses MongoDB queries to identify how long workflows have been executing and a front-end http request to cancel any that have been identified as exceeding specified timeout limits.

Bargeton · ‎02-20-2023

Hello,

On my side, this workflow retrieve my jobs running more than 1 hour, but it killed nothing...

When checking the worflow, it does a call to this url:

[base_url_login]+'api/jobs/'+[JobId]+'/cancel/'

But:

1) I see nothing to authenticate to the webapi with API key & secret into this workflow.

2) From my server, checking the webapi documentation

https://my_alteryx_server/webapi/swagger/ui/index

And I have to switch to doc/1 for "job" api... and there is no "cancel"

I see only 2 points:

- get /v1/jobs/{id}/output/{outputId}

Get output for a given job

- get /v1/jobs/{id}

Retrieves the job and its current state

So I'm wondering if this workflow can still work in 2023 with last version(s) of the Alteryx Gallery.

Regards,

Alex

carlosteixeira · ‎02-22-2023

Hi @Bargeton how are you doing?

So, I dont test it yet in the 2023 version.

Probably the applications should be managed to work.

I'm planning tod o it in a few days forward.

If I found the solution I update this post.

SeanAdams · ‎06-18-2023

thank you @carlosteixeira - this is very useful and necessary for large server implementations!

cc: @moinuddin

carlosteixeira · ‎06-19-2023

Thanks @SeanAdams.

It's been a while since I implemented this and it's still working for my client.
But unfortunately, I haven't tested it for the new versions yet.
It's on my to-do list for this year. hahahahaha

Thanks again

Arti2893 · ‎10-13-2023

Can you please explain what parameters are being passed to download tool in detail

Andrej_k · ‎04-29-2024

Hello,

great approach.

Good news that with minor adjustments its working on 2023.1 server version.

We are using it to cancel also scheduled workflows that are running longer than expected (eg. issue with database). Weak moment is related to sessions input. Found out that active sessions from sessions table does not mean that session is actually active. It just tracks records of all sessions that were active. So if all sessions on gallery side are expired, we will still have records about active session, upon sending http request to delete job, it will respond that we have "invalid_session" and obviously nothing will happen. So out of working hours this solution does not work.

Did anyone found workaround?

Thinking about 2 options: open new session upon need (not sure how to do it though); or prolong session (do some fake activity).

Andrej_k · ‎05-07-2024

Just to post update:

implemented first approach - if there are any jobs to be cancelled will enable control container with python script that authenticates to gallery (requests, requests_ntlm libraries) and returns valid session_id, which is being used further in download tool to cancel job.

Still consider that such API endpoint is must have for server...

Arman_Khachatryan · ‎05-07-2024

Hi Andrej,

Thank you for exploring it. Is your download tool working to cancel the job. This flow was working for me without errors, but recently the tool is not cancelling the job. I am checking API Documentation in Alteryx Server Swagger but can't find an endpoint to cancel the job. Do you know if they don't support it anymore?

I appreciate your help in advance.

Thank you

Andrej_k · ‎05-07-2024

Hi Arman,

such endpoint does not exist in API so swagger won't help. We are sending http delete request to "imitate" as someone has manually pressed button "Cancel" in Gallery via browser.

First what is coming to my mind - check response from download tool (if http request was not correct server will respond somehow). Eg. it could be invalid_session as was in my case during non working hours;

If some updates were installed on server side, there might be different headers in http request. Then I suggest you to check what headers are being sent once you press on Cancel button (dev tools in browser, network tab) and update text input with those.

Hope it will be of any help :)

Arman_Khachatryan · ‎05-08-2024

Thank you for your response Andrej. I checked and there were some changes in the server side. I changed the parameters accordingly but in my case I see 401 Unauthorized error. I am using username and password for authentication.

I would appreciate if you have any ideas and thank you for trying to help.

Thanks again

fmvizcaino · ‎05-08-2024

Hi @Andrej_k ,

I totally understand if you can`t, but since you have adapted the solution from the great @carlosteixeira , it would be super nice if you could share your solution with everyone else here. 😁

I`m not asking for myself, just for the benefit of everyone else who may need this fantastic feature (which should be native, in my opinion 🙄)

Andrej_k · ‎05-10-2024

Hi all,

@fmvizcaino unfortunately here are some restrictions and I am unable to attach workflow, only screen shots and code snippet. So python script is the following:

[1]:

# List all non-standard packages to be imported by your 
# script here (only missing packages will be installed)
from ayx import Package
Package.installPackages(['requests','requests_ntlm'])

[2]

from ayx import Alteryx
import pandas as pd
import requests
from requests_ntlm import HttpNtlmAuth

def authenticate_and_get_session_df(username, password, url):
    headers = {
        'Content-Type': 'application/json',
        'Accept': 'application/json',
    }
    data = {
        "scheme": "windows",
        "parameters": [{"name": "updateLastLoginDate", "value": True}]
    }

    ntlm_auth = HttpNtlmAuth(username, password)

    try:
        response = requests.post(url, headers=headers, json=data, auth=ntlm_auth, verify=False)
        response.raise_for_status()
        response_data = response.json()
        df = pd.DataFrame(response_data, index=[0])
        return df
    except Exception as e:
        print(f"Authentication failed for URL {url}: {e}")
        return pd.DataFrame()

# Attempt to read credentials from Alteryx, handle exceptions
try:
    credentials_df = Alteryx.read("#1")
except Exception as ex:
    print(f"Error reading credentials from Alteryx: {ex}")
    credentials_df = pd.DataFrame()

if credentials_df.empty:
    print("No credentials found. Exiting.")
    session_df = pd.DataFrame({'sessionId': [None]})
else:
    # Extract values from the single row
    row = credentials_df.iloc[0]
    username = row['username']
    password = row['password']
    url = row['url']

    # Call the authentication function with extracted values
    session_df = authenticate_and_get_session_df(username, password, url)
    if not session_df.empty:
        print(f"Authentication successful for URL {url}!")
        print(session_df)
    else:
        print(f"Authentication failed or session data not retrieved for URL {url}.")
    
#export to first anchor
Alteryx.write(session_df, 1)

values are taken from Browser inspect (can use also Fiddler to track all requests responses). So we need to find step where we update session and get values: username, password, url. those should be sent as input in python script. output is sessionId value. the rest is the same as in original concept. Assume that this approach might be working only for the same authentication type as we have.

@parmarar sorry, hard to trouble shoot this part. Maybe worth to contact Alteryx Support. The only idea is to check all columns that are returned from Download tool. For successful cancellation I also have 401 Unauthorized response in download headers, but job is being cancelled. Download data column is empty if all is OK. Alternative option is again to do some workaround with python scripts.

carlosteixeira · ‎05-24-2024

Hello everybody. How are you?

After a long period (2 years) I reviewed the flow and updated this post. I included a new workflow with some adjustments and improvements due to new versions and discontinued tools.

You will see that in the new workflow, I am using a connection with MONGODB to be made via ODBC and no longer with the MONGO connector.

This new workflow is working in version 2023.2.1.51 Patch 1

Any questions, I am available.

Cheers

Carlos A Teixeira - Alteryx ACE

Arti2893 · ‎07-24-2024

@Arman_Khachatryan Were you able to find solution on 401 error. I am facing same issue

carlosteixeira · ‎07-25-2024

Hi @Arti2893 If I'm not wrong they create a service account and use it to keep the connections active.

Engine Works

How to Cancel Manual Jobs in Alteryx Server

Scenario

Problem

Solution

Part 1

Part 2

Part 3

Part 4

Implementation