Alteryx Designer Knowledge Base

Definitive answers from Designer experts.
Check out our powerful new search update! You can read more here. Please let us know if you have any feedback by creating a topic here.

Tool Mastery | Python

Data Scientist
Data Scientist
Created
Python.png

This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Python Tool on our way to mastering 
the Alteryx Designer: 

 

Python is one of the fastest growing programming languages in the world and is used for a wide variety of applications ranging from basic data manipulation to data science and software development. With the release of 2018.3 comes the long-awaited and highly anticipated Python Tool! Much like the R-Tool, the Python Tool allows you to seamlessly run code as a part of your Alteryx workflow. Also like the R-Tool, you will need to have some coding experience with the named language in order to use this tool to its maximum potential. In this Tool Mastery Article, we will introduce you to the fundamentals for using this tool.

 

When you first drop the Python Tool on to your Canvas you will see the following screen in the tool’s configuration window. This is a reminder to run your workflow whenever you connect your Python Tool to a new input data source. This pulls the input data into the Python Tool so that you can bring it into your Python Code.

 

2018-11-19_8-08-34.png

 

 

As described in this text, to get the Jupyter Notebook interface up and running, all you need to do is wait. IT takes a couple seconds for the Jupyter Notebook interface to get served the first time you open a Python Tool in an instance of Designer. The message you first see will be replaced with a Jupyter notebook interface.

 

 2018-11-19_8-10-31.png

 

 

For a general introduction to Jupyter Notebooks, please review their Beginner's Guide documentation. 

 

 

The first coding step in using the Python Tool is to import the Alteryx API package, which allows you to pass data between the Alteryx Engine and Python Tool. If you plan on reading in data from the Alteryx Engine or pushing data out to the Engine from the Python Tool, your code should start with:

 

from ayx import Alteryx

 

This piece of code is so fundamental it is automatically populated in the first cell of the Python Tool!

 

2018-11-19_8-10-31.png

 

 

To run an individual cell in the Python Tool, you click the play button in the top toolbar, or you can use the keyboard shortcut: shift + return.

 

2018-11-19_8-14-33.png

 

 

In addition to the ayx package, the Python Tool comes with a few python packages loaded by default. These packages are listed in the help documentation and primarily relate to Data Science. There is also a great article that reviews the functionality of each of these pre-installed packages. To load a package that is already installed, you can use the import command, as you would when creating a Python Script outside of Alteryx. If you would like to install a python library that is not included with the tool by default, you can use the Package.installPackages() function.

 

 2018-11-19_8-18-33.png

 

 

The little * asterisk where the cell number is usually displayed means that the cell is currently running.

 

On the success of installing a package, you will see some variation of the following messages related to dependencies and the version of the package installed.

 

2018-11-19_8-28-24.png

 

 

Optional Follow Along: If you'd like to follow along with this demonstration, please download the Iris Dataset attached to this article!

 

If you are bringing in data through the Input Anchor in Alteryx, you will need to run the workflow to make the incoming data available to the notebook. After running the workflow, you can use the Alteryx.read() function to bring the data into Python.

The only argument to this function is the specific connection you are reading in. Like in the R Tool, this argument is a string and will need to have quotations around it.

 2018-07-30_16-03-02.png

To read in this data stream as the variable name data, the code would read:

 

data = Alteryx.read("#1")

 

2018-11-19_8-31-47.png

 

 

If you try to read in data before running the entire workflow, you will likely see this FileNotFoundError:

 

2018-08-08_16-19-00.png

 

The solution is to save the workflow and then run the workflow. The next time you run the code in the cell with the play button, the error should be resolved.

 

Everything read into the Python Tool will be read in as a pandas data frame. This enables greater flexibility for processing the data in Python. You can change the data format after reading it in, but you will need to return any outputs back to a pandas data frame.

 

Now that I have brought in my data, I would like to analyze it. First, I will create a new cell by clicking the plus icon next to the save/create checkpoint button, or I could use the keyboard shortcut B to add a cell below my current cell.

 

2018-11-19_8-32-46.png

 

 

Other useful cell and notebook functions can also be found in this toolbar to the right of the insert cell below button. From left to right, the buttons are Save, Add a Cell 2018-08-07_12-13-14.png,  Cut Cell(s) 2018-08-07_12-13-57.png, Copy Cell(s))  2018-08-07_12-15-11.png, Paste Cell(s) 2018-08-07_12-16-07.png, Move Cell(s) Up 2018-08-07_12-16-36.png, Move Cell(s) Down 2018-08-07_12-17-34.png,  Run 2018-08-07_13-24-47.png,  Stop 2018-08-07_13-27-59.png, Restart the Kernel 2018-08-07_14-08-08.png, and Restart the Kernel and Rerun the Notebook 2018-08-07_14-09-41.png. All of these buttons have associated keyboard shortcuts. You can see a full list of Jupyter Notebook keyboard shortcuts by navigating to Help > Keyboard Shortcuts in the top toolbar.

 

 

For this demonstration I want to run cluster analysis on the infamous Iris data set, so in my new cell I will load the KMeans function from the Sci-kit learn Python module (included with the Alteryx Python Tool Installation), and write some simple code to create clusters and print the resulting cluster labels.

 

2018-08-07_11-27-41.png

 

Now, I can visualize my clusters with the matplotlib.pyplot python library (also included with the Python Tool by default).

 

opt2.png

 

Finally, writing an output from the Python Tool can be done using the with Alteryx.write() function. This function is currently only supported for pandas data frames. If you attempt to write something out other than a data frame, you will get the following TypeError.

 

2018-08-07_11-45-24.png

 

This error can be resolved by converting your output to a pandas data frame. If you are not yet familiar with pandas data frames, you might find the introduction to pandas data structures or the 10 minutes to pandas documentation helpful. Once you write the code with Alteryx.write() in the Python Tool, you will need to run the entire workflow to see the results in the output anchors of the tool.

 

2018-08-07_11-56-22.png

 

Now, all that is left to do is run the workflow, and the results will be populated in anchor 1 of the Python Tool Outputs.

 

With this overview, I hope you feel comfortable reading in, writing out, and processing data in the Python Tool. The only limits now are your imagination!

 

 

Things to know and Future Updates!

 

  • Starting with 2018.4, you can load externally created python scripts and Jupyter notebooks.
  • Metadata will not consistently populate in downstream tools for data coming out of the Python Tool.
  • There is an implicit type conversion from Boolean to integer on reading data into the Python Tool. Likewise, there is another implicit type conversion from Boolean to integer on writing out from the tool.
  • Starting with 2018.4, you now have the ability to set column data types when writing an output.
  • Only Pandas Data frames are currently supported for reading and writing out. You can not currently write out a plot, or read in and write out spatial objects.
  • Question Constants are not currently supported.

 

If you have any feedback for us on this tool, please post to the Product Ideas Page! Our Product Managers are very active here and would love to see any ideas for features or limitations within the Tool you encounter.

 

By now, you should have expert-level proficiency with the Python Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at community@alteryx.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.

 

Stay tuned with our latest posts every #ToolTuesday by following @alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.

Attachments
Comments
Meteoroid
Ok so df = dataframe() is function call to pandas. I'd suggest you check out datacamp.com or udemy or youtube to get the most of out of pandas. But in short, df in this instance is your variable name, and its going to hold your 'dataframe' , aka Spreadsheet of data.
Meteoroid

In addition to @D3100's comment, that line was made to set up the output since the inputs and outputs of the Python Tool must have the datatype as a Pandas Dataframe object. I merely drilled into the input dataframe to extract the cell with the filepath you put in, and wrapped it into a new dataframe for output. 

 

Please post a new thread if you have any other issues.