We are updating the requirements for Community registration. As of 7/21/21 all users will be required to register a phone number with their My Alteryx accounts. If you have already registered, you will be prompted on your next login to add your phone number.

Data Science

Machine learning & data science for beginners and experts alike.
JeffA
Alteryx
Alteryx

Welcome to the Jupyter Flow Basics guide! If you're well versed on Jupyter and python virtual environments, you may consider the Jupyter Flow Help docs instead, which cuts straight to the chase. Or you may want to skip to section V in this guide! Visit Introducing the Jupyter Flow tool for an introduction to the tool and it's capabilities.

 

Here you will learn how to install Python, manage virtual environments, create and integrate Jupyter notebooks into Alteryx workflows, export and share those workflows, and run them on Server. Sections I through IV are intended for users who have no experience with python, virtual environments, Jupyter, or creating Jupyter notebooks.

 

Before starting this guide, make sure you have Alteryx Designer version 2020.4 or later, and have installed Jupyter Flow onto your system.

 

Note: the following guide uses Windows command line arguments. These arguments will be different depending on the operating system you’re using.

 

 

 

I. Install Python 3.8.5

You can skip this step if you already have python 3.8.5 installed, or you already have a preferred method to create python installs. Simply note that this tool has only been tested with Python 3.8.5 environments.

 

  1. Download the correct Python installer for your system

  2. Follow the instructions to install Python 3.8.5

  3. Open a new command prompt and type “python” and press “Enter”. If you get a python interpreter like this, you can skip steps 4 through 6:

    index.png

  4. Find out where python was installed

    1. If installed as a user, you may find it at "C:\Users\your_username\AppData\Local\Programs\Python\Python38\"

  5. Add this path to your user PATH environment variable

  6. If you would prefer not to add this to your PATH, simply use the full path to python for the rest of this guide.

    1. Everywhere in this guide you see the following code, replace python with your version of the full path to your installed python version (ex: "C:\Users\your_username\AppData\Local\Programs\Python\Python38\python.exe"):

      python <<some_commands>>

 

 

 

II. Create and Configure a Virtual Environment

You can skip this section if you already know how to create a virtual environment and add packages to it using pip.

 

  1. Open a new command prompt
  2. Create a new folder in which to store your virtual environments:
    mkdir my_env
  3. Navigate into your new folder:
    cd my_envs
  4. Create a new virtual environment:
    python -m venv first_environment
  5. See your new environment by listing folder contents:
    dir
  6. Install pandas package into the new environment:
    first_environment\Scripts\pip.exe install pandas
    JeffA_0-1624917968323.png

     

Note: If you would like to pass data to/from Alteryx workflows and your notebook, you will need to use pandas at this time. Notebooks can run without Pandas, but the only way to pass data in/out of a notebook currently is via pandas.

 

 

 

III. Install and Run Jupyter Notebook

You can skip this step if you already know how to use Jupyter. Simply note that DLL kernel errors may be fixable by installing pywin32==300 in the environment you use to run Jupyter.

 

 

  1. In your environment, install jupyter:
    first_environment\Scripts\pip.exe install jupyter
  2. There is currently a bug in Jupyter which can be fixed by running the following command:
    first_environment\Scripts\pip.exe install pywin32==300
  3. Open Jupyter
    first_environment\Scripts\jupyter.exe notebook
  4. If the previous command succeeded, you should see the Jupyter notebook page in your default browser:

    JeffA_0-1624913434121.png

 

 

 

IV. Create a Simple Jupyter Notebook

You can skip this step if you already know how to create and run Jupyter notebooks.

 

 

  1. Click "New" -> Python3

    JeffA_1-1624913618673.png

  2. You should now see an empty notebook

    JeffA_2-1624913688687.png

  3. Add the following code to the notebook:
    import pandas as pd
    data = pd.DataFrame({"text": ["Jupyter", "by", "itself"], "number": [1,2,3]})
    data
  4. Run the notebook. You should see something like this:

    JeffA_0-1624926541451.png

  5. In the last field of the notebook, type the following command and run the cell:
    !dir
  6. You should now see the following output. Copy the directory path and save it for later:

    JeffA_5-1624940038786.png

     

  7. Select "Untitled" at the top of the notebook to change its name:

    JeffA_2-1624933979139.png

  8. Give your notebook a name and click "Rename":
    JeffA_3-1624934022907.png
  9. Save the notebook using ctrl+s
  10. If you would like to know more about Jupyter, see the jupyter website

 

 

 

V. Run a Jupyter Notebook from an Alteryx Workflow

Summary: After specifying a notebook and site-packages, the tool will build an environment for the notebook to run in, and then run the notebook. The environment only builds the first time a new set of packages are specified, but can take some time. If the packages or versions thereof change, the environment will build again.

 

 

  1. Open Alteryx Designer and open a new workflow
  2. Find the Jupyter Flow tool in the search bar or Laboratory and drag the tool to the canvas:

    JeffA_1-1624932095715.png

  3. The bare minimum of things you must provide to the tool are the notebook and the environment packages
  4. Click the Browse button next to the "Notebook" field and paste the notebook's directory name (saved previously) into the navigation bar and press "Enter":

    JeffA_8-1624939074222.png

  5. Select the notebook previously created and click "Open":

    JeffA_1-1624934350072.png

  6. Back in the configuration pane, click the "Browse" button next to the "Packages" field. Navigate to your environment's Lib/site-packages folder and click "OK":

    JeffA_7-1624938985863.png

  7. The configuration pane should now look something like this:

    JeffA_3-1624934614362.png

  8. Run the workflow
  9. The first time Jupyter Flow runs with a new environment, you will see a message indicating that the environment is building. The amount of time this takes varies greatly depending on the size of your site-packages folder. Once your environment has been built, this step will no longer occur unless the environment changes.

    JeffA_4-1624936684506.png

  10. My first run took 2 minutes and 24 seconds:

    JeffA_5-1624936740326.png

  11. Run the workflow. Now that the environment has been built, the environment build step is gone. My second run took 3.9 seconds:

    JeffA_6-1624936812880.png

  12. Note that there is not yet any data passing through the notebook. This is only useful If you want to schedule your notebooks but do not care about passing Alteryx data into or out of them. Read on to find out how to connect Alteryx data to your notebook!

 

 

 

VI. Pass Data Through your Notebook

Summary: Adding #ayx_input=<<name of alteryx input connection>> above a variable assignment will replace that variable with a dataframe representation of the data flowing through the input connection specified. Adding #ayx_output=<<output anchor number>> above a variable or a variable assignment will output the variable (assuming it's a pandas dataframe) to the specified output anchor in the workflow.

 

 

  1. Open the Jupyter notebook
  2. Add a cell to the bottom of the notebook with the following code and save the notebook:
    #ayx_output
    data
  3. The notebook should look like this:

    JeffA_4-1624939953619.png

  4. Run the workflow
  5. Select the first output anchor of the tool. It should show the following outputs:

    JeffA_3-1624939915433.png

  6. In the Jupyter notebook change "#ayx_output" to "#ayx_output=3" and save:

    JeffA_0-1624940500594.png

  7. Run the workflow
  8. Select the third output anchor of the tool. It should show the following outputs and the first anchor should no longer have any outputs:

    JeffA_7-1624937767563.png

  9. Drag a TextInput tool to the canvas and configure as follows:

    JeffA_1-1624937283443.png

  10. Connect the TextInput tool to the input anchor:
    JeffA_2-1624937330857.png

  11. In your notebook, add "#ayx_input" above the assignment of "data" and save the notebook:

    JeffA_6-1624938720013.png

  12. Run the workflow
  13. The third output anchor should now contain the data from the Text Input tool:

    JeffA_9-1624938088277.png

  14. Add a new TextInput tool and configure as shown:

    JeffA_0-1624939760696.png

  15. Connect the TextInput tool to the input anchor:

    JeffA_0-1624938386115.png

  16. The input tag must now specify which connection to use.
  17. Open the Jupyter notebook and change "#ayx_input" to "#ayx_input=#2":

    JeffA_2-1624938515310.png

  18. Run the notebook again
  19. The third output anchor should show data from the second Text Input tool:

    JeffA_1-1624939800472.png

  20. Moonbuggy can make use of multiple inputs at the same time. Simply use the "#ayx_input=" tag plus the name of the input connection.

 

 

 

VII. Share Your Jupyter Workflow

Summary: Using Alteryx Designer's built-in export workflow feature will export all of the assets required for another user (or a Server) to run a workflow containing this tool. The environment packages will be exported inside the .yxzp file.

 

 

  1. In the configuration pane, deactivate the "Packages" toggle:

    JeffA_1-1624942361034.png

  2. Save your workflow
  3. Navigate to Options -> Export Workflow:

    JeffA_2-1624942428911.png

  4. Ensure your notebook (.ipynb file) and the zip app (.pyz file) are selected in the "Export Workflow" dialogue:

    JeffA_4-1624942519223.png

  5. Click "Save"
  6. You may now share the exported ".yxzp" file with anyone else who has the Jupyter Flow tool installed on their machine.

 

 

 

VII. Run Your Jupyter Workflow from Server

  1. In the configuration pane, deactivate the "Packages" toggle as shown in the "Share your Jupyter Workflow" section.
  2. Ensure the tool has been installed on your desired gallery instance
  3. Using your preferred method, save your Jupyter workflow to your gallery
  4. Run the workflow on Server just like any other Alteryx workflow

 

Banner image by Beate Bachman

Comments
atcodedog05
18 - Pollux

@JeffA This is a super helpful article 🙂👍

JeffA
Alteryx
Alteryx

Great to hear @atcodedog05. Let me know how well the tool works for you!