Dev Space

JPKa · ‎11-28-2017

Thought I'd share a little trick that was very handy when trying to diagnose and fix a problem I was having when trying to work on a tensorflow based tool with the Python SDK.

It looks like it's choking on the attempt to import tensorflow.

Hmmmmmmm, this is interesting, because I know Tensorflow doesn't have any issues when trying to import from a standard Python session from the SDK.

Imports with no issues from a standard Python session.

What's going on here, and how do I dig in and solve the problem?

The key thing to understand is that when a Python SDK tool's code is executed, it is run in a special Python process embedded in Alteryx's main C++ process. This provides a number of huge advantages, by allowing the SDK's "plumbing" oriented operations to be performed by low-level, lightweight, and efficient C++ code, leaving the pure Python for the unique functionality provided by the tool.

The key disadvantage is that most Python IDEs don't support embedded interpreters, leaving you with a smaller tool set for debugging your code. Additionally, there isn't a way to fire up a REPL that runs C++ with an embedded Python interpreter to help debug code in isolation. Additionally, certain libraries may rely on environmental variables that aren't available in an embedded process, and these won't readily import without errors.

The good news:

Python comes with a built-in debugger called pdb. Take a wild guess what it stands for.

Let's take a spin with pdb and see if we can fix our Tensorflow error.

First, we comment out the tensorflow import line, and add an import of pdb library. Then we add a line below to tell pdb to set an interactive break point.

Next, after saving our edited python file, we use the AlteryxEngineCmd application to run a workflow which contains the tool we are working on. As you can see, it pauses at the line where we placed the pdb.set_trace() command, and we see a REPL with the (Pdb) prefix.

Now that we have a REPL available to us in the context of the embedded Python process, let's try importing Tensorflow again.

It looks like the tensorflow library is expecting the sys.argv attribute to be present. This is the kind of failure we expect in an embedded process. It's expecting to be run in a non-embedded Python process which has access to environment variables like sys.argv.

No surprise here. Tensorflow wants a variable that is present in the non-embedded process, but not available in our embedded Python.

Let's see if we can fix that, shall we?

First, let's fire up the non-embedded Python REPL again, and take a look at the sys.argv variable.

Our non-embedded Python process has a list with a single, empty string assigned to sys.argv.

We know that Tensorflow successfully imports in the non-embedded process, and that one delta is the lack of this variable in the embedded process. Let's launch the workflow again, and do a quick test in the pdb REPL to see if we can learn more.

First, we import the sys library. Then we set the value of the sys.argv attribute to the value we saw in the non-embedded process. We try to import Tensorflow, and it works.

As you can see, the sys.argv variable is the "A/B switch" for the failed Tensorflow import.

As a pragmatist, I simply add the lines in to my plugin to give Tensorflow what it wants.

Add in the sys.argv variable that Tensorflow wants to our plugin code.

The tool now runs without incident.

Here, we can now see that the problem was solved.

Another neat trick to keep in mind is that this is a great tool for exploring the Python SDK's core data objects and methods, as well as the data being passed around to various methods.

Here we add the pdb.set_trace() command into the body of the pi_init function.

Now that we've added the set_trace command, let's run the workflow again from the command prompt, and inspect the contents of the str_xml variable passed into the method.

Here, we can see the contents of the str_xml passed to AlteryxEngine from the config window of the tool.

Hopefully, this is useful to all of you folks out there hacking away on the Python SDK.

Looking forward to seeing you build great things with it.

JP Kabler
Lead Software Engineer, Assisted Modeling
Alteryx

TashaA · ‎11-29-2017

Nice write up @JPKa! I'm definitely going to try this out.

pavloko · ‎02-04-2019

Great article, but are there a debugging strategy for developing .yxi Custom Tools with Python SDK?

JPKa · ‎02-04-2019

@pavloko I'm not sure what you mean? This article is exactly for that purpose. Are you looking for a guide or tutorial?

JP Kabler
Lead Software Engineer, Assisted Modeling
Alteryx

pavloko · ‎02-05-2019

Yes, I missed the point that the tool should be part of .yxmd workflow.

Do you need to have a specific license or does it come as part of Alteryx Designer and should be manually turned on?

pavloko · ‎02-06-2019

@JPKA. could also provide some insight into when Alteryx Designer will reload the code? When I make a change in Python code, clicking outside and back on the Tool doesn't work as with JavaScript. I have to reopen the workflow for changes to take effect. Is this normal?

BlytheE · ‎02-06-2019

Hi @pavloko, the Python SDK is interacting with the engine, so in order to test changes made within your Python code, you will need to run the tool in a workflow (you shouldn't have to reopen the workflow each time). Some Python backend errors may appear when you click on and off of the tool and they are likely related to the initialization of the tool.

Dev Space

Python SDK: How to Debug an Error using pdb package