Dev Space

chrisha · ‎08-16-2019

Hey there,

I'm currently a bit frustrated with the package management in the Python tool: Since there is only one venv and one Kernel available, all workflows on a machine have to use the same packages and package versions. This led me to post this suggestion for future Alteryx versions: https://community.alteryx.com/t5/Alteryx-Designer-Ideas/Python-Tool-Managing-Virtual-Environments-ve...

However, I'm now running into a problem. Alteryx is shipped with scikit-learn==0.20.0 and I have a model developed in a Python environment using this package version. The model was pickled and is loaded in the Python tool. So far so good. In another project, I'm using another package that requires scikit-learn>=0.21.

I could install the newer version in the PythonTool_venv quite easily. But this would potentially break my pickled model (loading pickled objects using a package version different from the original environment can be problematic). So, updating the model would require me to:

Update the original development venv for model 1.
Retrain model 1 and update pickled model.
Update the PythonTool_venv on my machine.
Update the PythonTool_venv on any other machine working on the same project (and potentially other projects!).
Update the PythonTool_venv on our Alteryx Server.
Deploy updated workflow and updated model to all colleagues and the Alteryx server.

The process is cumbersome and might break other projects on other machines.

Does anyone have any suggestion how to work around the missing dependency management in the Python tool?

The only alternative I can currently think of, is to package the models in Custom Tools using the Python SDK, where I can manage my virtual environment. But this is also quite tedious as I have to write all the record reading and writing stuff, that is taken care of in the Python tool.

Better ideas?

Best

Christopher

cam_w · ‎08-16-2019

Hey Christopher,

I don't disagree with you, and I steer clear of updating the default packages in PythonTool_venv for your well thought out reasons. You might also want to try submitting a ticket to the scikit-learn repo if they break stuff between releases. Changes to functionality should be managed, communicated and deprecated over time. But we all know ... sometimes stuff just breaks! 🙂

My suggestion would be to try SnakePlane. It actually makes the process of Record reading/writing easier, so you could try that out and see if you like it better than the 'straight' SDK approach.

I'll drop a comment on your idea as well ... 🙂

Kind regards,

Cameron

chrisha · ‎08-19-2019

Hi Cameron,

thanks for your feedback!

I didn't get SnakePlan to work yet, since I have a weird hybrid development environment involving VSCode in macOS and Alteryx in a Windows VM. Will give it another try, though. 🙂 Certainly makes developing for the Python SDK easier...

The sklearn problem was more an example. There aren't any breaking changes between minor versions in my case. Nevertheless, loading pickled objects in a different package version can sometimes have unexpected results even when features didn't change. For me it's more about having a consistent and manageable environment for my models in general.

Best regards

Christopher

Dev Space

Managing Python requirements (Python Tool)