This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Now that it's live, don't forget to accept your certification badge on Credly today! Learn more here.
We are currently experiencing an issue with Email verification at this time and working towards a solution. Should you encounter this issue, please click on the "Send Verification Button" a second time and the request should go through. If the issue still persists for you, please email firstname.lastname@example.org for assistance.
Loved by analysts, data scientists, and software engineers, Jupyter notebooks are wonderfully interactive and portable. Their clever interface helps us visualize analytics, run machine learning models, and explore new possibilities. Hats off to Jupyter. It's made a lot of peoples' jobs easier.
Once you have a good Jupyter notebook (or Python script for that matter), what do you typically do with it? How easy is it to integrate into a broader process? An interactive process? For that matter, how easy is it to integrate any python script into a broader process? It's time to squeeze more value out of these powerful tools.
That's the job of the Jupyter Flow tool. This tool is available for Alteryx 2020.4 and later in the Laboratory. Download Jupyter Flow here.
Similarly, the two things you need to run the Jupyter Flow tool are:
1. A Jupyter notebook
2. A site-packages folder
Two inputs required to run a Jupyter Flow tool. Shown in the config pane.
After the first run (which, depending on the size of your site-packages folder, can be a wait - kind of like a trip to Jupiter), you will be gifted with a .pyz file. This is your ticket to sharing this workflow, running it on Server, and sharing environments (on a network perhaps?).
The .pyz file which unlocks sharing, Server runs, and more.
Switch off the package watcher if you'd like to share your workflow or run it on Server.
The packages toggle, which enables/disables environment building.
Surely you'll want to pass Alteryx data into and out of your Jupyter notebook.
Jupyter Flow tries to noninvasively work with your notebooks. So reading data from or writing data to Alteryx involves the use of comments in the form of input and output tags.
There are four possible tags:
#ayx_input #ayx_output #ayx_input= #ayx_output=
These tags are placed inside your notebook, above the data frame(s) you would like to replace with Alteryx data or output to Alteryx data. These tags do nothing when you're running your notebook outside of Alteryx. When Alteryx runs the notebook, however, it picks up on these tags and modifies the code to pass data in/out.
For example, you may have the following code in my Jupyter notebook as well as the following Jupyter Flow tool with available input connections #1 and #2:
Now when Alteryx runs the Jupyter Flow tool, `lung_cancer_images` will be set to the data coming in on connection #2 in the workflow shown above, instead of `get_lung_cancer_images_dataframe()`. Alteryx will generate and run a `_post_processed` version of the notebook (along with a path to that notebook in the workflow messages) which is the version of the notebook connected to Alteryx. You can see and debug this notebook (see instructions under Advanced Options or the help docs).
But wait, there's more! Jupyter Flow also helps you with:
Managing your Jupyter Flow generated environments
Debugging support - run through your notebook line by line using Alteryx data from a previous run
Data cache location configuration (for security or data management needs)
You may enable custom zip app (.pyz file; the environment file) paths. This allows you to generate your .pyz environments one time, save them on a network drive or other shared location, and force the tool to use that environment. Find this options under the Advanced accordion:
When Jupyter Flow runs your notebooks, it modifies a copy of them in the same directory the notebook exists. The copy will have `_post_processed` appended to its name. If you open this notebook, you can see what Jupyter Flow has done to enable data to flow into and out of the notebook. You can also run the notebook. However, for performance and security reasons, Jupyter Flow deletes its data caches by default. So to run these `_post_processed` notebooks, enable data cache backup under advanced options:
Configure Data Cache Location
For security and performance reasons, Jupyter Flow automatically deletes all of its data caches after each run. However, if data cache backup is enabled (for debugging or other purposes), these caches will stick around. When Jupyter Flow runs, the messages section of Designer will inform you of where this data is being cached. If you do not like its chosen cache location (the system's temp directory), you may change this by setting the data cache location to "custom" and typing a path to your desired location:
Integrate your notebooks with Alteryx using the Jupyter Flow tool and let us know what you think!