ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now
The Alteryx Community will be temporarily unavailable for a few hours due to scheduled maintenance starting on Thursday, April 22nd at 5pm MST. Please plan accordingly.

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Python Tool: Managing Virtual Environments (venv) and Requirements

With an increasing number of different projects, involving different machine learning models, it's becoming difficult to manage different package versions across workflows. Currently, the Python tool has a single virtual environment, so we need to develop models in different projects always using the same Python and package versions as the Python tool venv. While this doesn't bother the code itself too much, it becomes a problem as soon as we store and load pickled models, which are sensitive to even minor changes in packages.

 

This is even more so a problem when we are working on the Alteryx server, where different teams might use different packages. Currently, there is only the server admin who can install packages on the server and there can only be one version per package.

 

So, a more robust venv management in the Python tool would be much appreciated!

5 Comments
c2willis
11 - Bolide

@BlytheE- for visibility 🙂

 

My thoughts on this are not well developed ... however, for the past few months I've been leaning towards 'images' as the best solution for this issue, whether docker images or some other solution. The flow would go something like this:

 

  1. Alteryx ships a default image with the Python tool, that could be utilized by the user out of the box as a stand-alone python container to run the jupyter server and python code with the default packages already installed.
  2. Users that need to install additional packages would create a new image layered on top of the default image.
  3. Users would have a control - drop down or whatever - to select the image that they want to use for the python tool.
  4. The image Dockerfile (and/or compose yml file) and settings would save with the Python tool for copying to new workflows, or sending to other users.
  5. The layering of images should reduce the amount of space required compared to an equivalent Virtual Environment solution.

Downsides:

  • Packaging docker with Alteryx might be a hard sell! However, it might be a nice optional 'add on' for environments that already have docker installed.
  • Image management would become a 'thing', and users would need understand of how to do this, or a tool to help them decide what they need to keep.

Anyway, those are just my thoughts ... 🙂

chrisha
10 - Fireball

Good thoughts, @c2willis ! Docker containers for workflows more generally would be a great way to improve deployment feasibility. Especially when different machines have different SDKs and tools installed (e.g. several Machine Learning packages for Python require Windows C++ SDK installations which might not be available on all machines).

 

I fear that integrating Docker will be many months of hard work for the Engine team and might be a bit out of scope. For the narrower use case of the Python tool, virtual environments are already a well-established and easy-to-configure system. Furthermore, Alteryx already makes use of virtual environments for the Python SDK custom tools. The challenge might be integrating the details for the venv within the Python Tool.

TimN
8 - Asteroid

We need this as well.

 

Thanks.

claugreco
5 - Atom

Managing Virtual Environments within the Python tool will be very beneficial for us too.

YEM
7 - Meteor

If you upgrade Alteryx Server, all of the packages you installed are removed.

 

Example.  Let's say in the past you did something like this:

from ayx import Alteryx
Alteryx.installPackage(package="tableau-api-lib")

 

The Python executable that comes with Alteryx Server is located here:

import sys
print(sys.executable)

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\python.exe

 

We now can tell the tableau-api-lib package we installed in step 1 ends up here:

c:\Program Files\Alteryx\bin\Miniconda3\envs\JupyterTool_vEnv\Lib\site-packages

 

When you upgrade Alteryx Server, it rebuilds this folder from scratch I guess.  All custom modules installed with pip are lost.  Bummer!

 

Being able to invoke a virtual environment that is maintained by us would avoid this upgrade fire.