Dev Space

Customize & extend the power of Alteryx. SDKs, APIs, custom tools, and more!

Python SDK Input Tool does not maintain and propagate metadata

5 - Atom

Python SDK Source Tool - Save Metadata after Running Tool


Hi everyone,


I created an input tool using the python sdk which loads data from a remote interface and outputs the data for usage in alteryx. I do not know the metadata for my data in advance, thus I download the data and compute the metadata from it, which works fine.
Now I want to work on the subsequent workflow but adding any other tool, e.g. a “Select” tool, to the output of my tool leaves me with no metadata. The metadata from the last run is not properly passed downstream — even though the metadata is shown at the input anchor (possibly from the last run).


Looking at the tools lifecycle, it appears that when adding a downstream tool alteryx instantiates a new AyxPlugin which then is expected to produce the metadata and pass it down to the downstream tools. The problem is that the new instance of my tool does not have the data downloaded (anymore), thus I can’t push the metadata of this data. The solution would be to save the metadata in some local variable and pushing it at its next call. As the AyxPlugin is newly created with each new tool, this however does not work — all local variables are reset. So I’d need some kind of storage that persists over different instances of the python class. Another idea is to use a temp file but this has to be deleted when the workflow or alteryx is closed. (There is a function AyxEngine.create_temp_file_name creates such a temp file but then I’d need to save the file name somewhere and have the same problem.)


Do you know if there is a workflow persistent temp (not only for a run) or some other method to store the metadata?


This is normal behavior for all tools in Alteryx, not only Python SDK tools. There are two solutions. One is fairly simple and the method used in 99.9% of built-in tools, so we'll start with that.

Quick background for you, somewhat simplified. When your tool is selected on the canvas and the configuration window is shown, a new instance of it is created. This new instance is initialized with the configuration settings it last had and the Alteryx engine expects a few other things to happen. When you click away from that tool, something called a configuration or update run happens across the canvas. The metadata from your tool is passed downstream through the workflow and each of the other tools updates if necessary. Think of it as running your workflow, but only on the metadata itself. Then that fresh instance of your tool is deleted. The whole process happens again the next time your tool is selected. That means nothing is persistent.

In your case, what you can do is this. If you are using the base Python SDK and not the Snakeplane wrapper, you can access the alteryx_engine object passed into your tool in its __init__ function. That object has a Boolean property called UpdateOnly. Now, since your tool is an input tool, the engine will call your tool's pi_push_all_records function. Inside that function, you check the value of self.alteryx_engine.UpdateOnly. If True, then the engine is running the configuration / update run I described above. In that case, if possible, only download the data or metadata from the source that is required to set the metadata. If False, you are in an actual workflow run and your tool should do what it is already doing.

If downloading the data in each configuration run is too expensive and you must cache it, there are other workarounds. What a couple built-in tools have done (and I won't see which) is modify the actual configuration XML for the tool. This can be done in pi_init and / or pi_close. This is more complex, but works. This involves having an initially empty configuration field in the XML for a temporary file path that will contain your metadata. Once that metadata is generated and saved, set that path in pi_init and / or pi_close. If that field is not blank when reading your configuration XML in pi_init, then load the metadata from that file (if it still exists). More complex, and maintaining that cache is a pain, but it can work.

David Wilcox
Senior Software Engineer