Tool Mastery

SydneyF · ‎08-28-2018

This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Python Tool on our way to mastering
the Alteryx Designer:

Python is one of the fastest growing programming languages in the world and is used for a wide variety of applications ranging from basic data manipulation to data science and software development. With the release of 2018.3 comes the long-awaited and highly anticipated Python Tool! Much like the R-Tool, the Python Tool allows you to seamlessly run code as a part of your Alteryx workflow. Also like the R-Tool, you will need to have some coding experience with the named language in order to use this tool to its maximum potential. In this Tool Mastery Article, we will introduce you to the fundamentals for using this tool.

When you first drop the Python Tool on to your Canvas you will see the following screen in the tool’s configuration window. This is a reminder to run your workflow whenever you connect your Python Tool to a new input data source. This pulls the input data into the Python Tool so that you can bring it into your Python Code.

As described in this text, to get theJupyter Notebook interface up and running, all you need to do is wait. IT takes a couple seconds for the Jupyter Notebook interface to get served the first time you open a Python Tool in an instance of Designer. The message you first see will be replaced with a Jupyter notebook interface.

For a general introduction to Jupyter Notebooks, please review their Beginner's Guidedocumentation.

The first coding step in using the Python Tool is to import the Alteryx API package, which allows you to pass data between the Alteryx Engine and Python Tool. If you plan on reading in data from the Alteryx Engine or pushing data out to the Engine from the Python Tool, your code should start with:

from ayx import Alteryx

This piece of code is so fundamental it is automatically populated in the first cell of the Python Tool!

To run an individual cell in the Python Tool, you click the play button in the top toolbar, or you can use the keyboard shortcut: shift + return.

In addition to the ayx package, the Python Tool comes with a few python packages loaded by default. These packages are listed in the help documentation and primarily relate to Data Science. There is also a great article that reviews the functionality of each of these pre-installed packages. To load a package that is already installed, you can use the import command, as you would when creating a Python Script outside of Alteryx. If you would like to install a python library that is not included with the tool by default, you can use the Package.installPackages() function.

The little * asterisk where the cell number isusually displayed means that the cell is currently running.

On the success of installing a package, you will see some variation of the following messages related to dependencies and the version of the package installed.

Optional Follow Along:If you'd like to follow along with this demonstration, please download the Iris Dataset attached to this article!

If you are bringing in data through the Input Anchor in Alteryx, you will need to run the workflow to make the incoming data available to the notebook. After running the workflow, you can use the Alteryx.read() function to bring the data into Python.

The only argument to this function is the specific connection you are reading in. Like in the R Tool, this argument is a string and will need to have quotations around it.

To read in this data stream as the variable name data, the code would read:

data = Alteryx.read("#1")

If you try to read in data before running the entire workflow, you will likely see this FileNotFoundError:

The solution is to save the workflow and then run the workflow. The next time you run the code in the cell with the play button, the error should be resolved.

Everything read into the Python Tool will be read in as a pandas data frame. This enables greater flexibility for processing the data in Python. You can change the data format after reading it in, but you will need to return any outputs back to a pandas data frame.

Now that I have brought in my data, I would like to analyze it. First, I will create a new cell by clicking the plus icon next to the save/create checkpoint button, or I could use the keyboard shortcut B to add a cell below my current cell.

Other useful cell and notebook functions can also be found in this toolbar to the right of the insert cell below button. From left to right, the buttons are Save, Add a Cell, Cut Cell(s) , Copy Cell(s)),Paste Cell(s), Move Cell(s) Up, Move Cell(s) Down, Run, Stop, Restart the Kernel,and Restart the Kernel and Rerun the Notebook. All of these buttons have associated keyboard shortcuts. You can see a full list of Jupyter Notebook keyboard shortcuts by navigating to Help > Keyboard Shortcuts in the top toolbar.

For this demonstration I want to run cluster analysis on the infamous Iris data set, so in my new cell I will load the KMeansfunction from the Sci-kit learn Pythonmodule(included with the Alteryx Python Tool Installation), and write some simple code to create clusters and print the resulting cluster labels.

Now, Ican visualize my clusters with the matplotlib.pyplot python library (also included with the Python Tool by default).

Finally, writing an output from the Python Tool can be done using the with Alteryx.write() function. This function is currently only supported for pandas data frames. If you attempt to write something out other than a data frame, you will get the following TypeError.

This error can be resolved by converting your output to a pandas data frame. If you are notyet familiar with pandas data frames, you might find the introduction to pandas data structuresor the 10 minutes to pandas documentationhelpful.Once you write the code with Alteryx.write() in the Python Tool, you will need to run the entire workflow to see the results in the output anchors of the tool.

Now, all that is left to do is run the workflow, and the results will be populated in anchor 1 of the Python Tool Outputs.

With this overview, I hope you feel comfortable reading in, writing out, and processing data in the Python Tool. The only limits now are your imagination!

Things to know and Future Updates!

Starting with 2018.4, you can load externally created python scripts and Jupyter notebooks.
Metadata will not consistently populate in downstream tools for data coming out of the Python Tool.
There is an implicit type conversion from Boolean to integer on reading data into the Python Tool. Likewise, there is another implicit type conversion from Boolean to integer on writing out from the tool.
Starting with 2018.4, you now have the ability to set column data types when writing an output.
Only Pandas Data frames are currently supported for reading and writing out. You can not currently write out a plot, or read in and write out spatial objects.
Question Constants are not currently supported.

If you have any feedback for us on this tool, please post to the Product Ideas Page! Our Product Managers are very active here and would love to see any ideas for features or limitations within the Tool you encounter.

By now, you should have expert-level proficiency with the Python Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know atcommunity@alteryx.comif you’d like your creative tool uses to be featured in the Tool Mastery Series.

Stay tuned with our latest posts every#ToolTuesdayby following@alteryxon Twitter! If you want to master all the Designer tools, considersubscribingfor email notifications.

Additional Information

Click on the corresponding language link below to access this article in another language -

Portuguese
Spanish
French
German
Japanese

D3100 · ‎12-18-2019

Ok so df = dataframe() is function call to pandas. I'd suggest you check out datacamp.com or udemy or youtube to get the most of out of pandas. But in short, df in this instance is your variable name, and its going to hold your 'dataframe' , aka Spreadsheet of data.

Sunny_P · ‎12-18-2019

In addition to @D3100's comment, that line was made to set up the output since the inputs and outputs of the Python Tool must have the datatype as a Pandas Dataframe object. I merely drilled into the input dataframe to extract the cell with the filepath you put in, and wrapped it into a new dataframe for output.

Please post a new thread if you have any other issues.

gavinott1 · ‎04-22-2020

Hi,

How can I pick out just one value, as opposed to a whole table? In part of my script, I'm trying to piece together a URL, and one of the pieces comes from a Text Input tool that contains only 1 record ("100"), but when I try to insert this it inserts into the URL "Record 0 100" instead of just "100". Can anyone help me with this?

Thank you!

Gavin Ott

fpinchon · ‎02-14-2021

@gavinott1

To pick only the first record, use this:
dat = Alteryx.read('#1')
NAME_OF_YOUR_VARIABLE = dat['NAME_OF_YOUR_COLUMN'].iloc[0]

Cheers,

Frederic

fpinchon · ‎02-14-2021

At the end of the article, the following improvement is mentioned:

Starting with 2018.4, you now have the ability to set column data types when writing an output.

I can't find any documentation related to how to do that..

Any idea?

Cheers,

Frederic

NeilR · ‎02-16-2021

Hi @fpinchon, this functionality is described by running Alteryx.help() interactively in the tool. An example:

from ayx import Alteryx
import pandas as pd
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
Alteryx.write(df,1,{'col1': {'name': 'Field1', 'type': 'Int32'},'col2': {'name': 'Field2', 'type': 'Int64'}})

Screenshot 2021-02-16 082240.png

NeilR · ‎02-16-2021

Here are the full results of running Alteryx.help():

Code snippets for passing data between Alteryx and Jupyter

(See help documentation for additional details.)

Alteryx.read( "<input connection name>" )
Input data will be returned as pandas dataframe. [Note: You must run the workflow to cache the incoming data and make it accessible within this interactive notebook.]*

df = Alteryx.read("#1")
► SUCCESS: reading input data "#1"
Alteryx.readMetadata( "<input connection name>" or <pandas dataframe> )
Input field metadata will be returned as dict. [Note: You must run the workflow to cache the incoming data and make it accessible within this interactive notebook.]*

md = Alteryx.readMetadata("#1"); print(md);
► {'Field1': {'type': 'Byte', 'length': 1, 'source': 'TextInput:', 'description': ''}}
md = Alteryx.readMetadata(df); print(md);
► {'Field1': {'type': 'Byte', 'length': 1, 'source': '', 'description': ''}}
Alteryx.write( <pandas dataframe>, <output anchor number> , [metadata list or dict] )
A preview of the data will be displayed in Jupyter, but the full dataframe will be passed to Alteryx when the workflow is executed. Metadata can optionally be passed in, either as a list (using column order) or dict (using column names)*

Alteryx.write(df, 1)
► SUCCESS: writing outgoing connection data 1
Alteryx.write(df, 2, Alteryx.readMetadata("#1"))
► SUCCESS: writing outgoing connection data 2
Alteryx.write(df, 3, {'Field1': {'name': 'Field1_as_double', 'type': 'Double'}})
► SUCCESS: writing outgoing connection data 3
Alteryx.write(df, 4, [{'name': 'Field1_as_string', 'type': 'V_String', 'length': 8}])
► SUCCESS: writing outgoing connection data 4
Alteryx.getIncomingConnectionNames( )
A list containing all incoming data connections will be returned. If the connections look out of sync, re-run the Alteryx workflow. (As with the read function, a snapshot of the data from the previous run is used when the function is called interactively.)*

Alteryx.getIncomingConnectionNames()
► ["#1", "#2", "model"]
Alteryx.installPackages( *"<package name or list of package names>" *)
Package(s) will be installed from PyPI. [Note: An internet connection is required. Also, if using an admin install of Alteryx, Alteryx must be opened in admin mode to install packages. Non-admin installs do not have this restriction.]*

Alteryx.installPackages("tensorflow")
Alteryx.installPackages(["keras","theano","gensim"])
Alteryx.getWorkflowConstant( *"<constant name>" *)
Returns the value of an Alteryx workflow constant (these can be seen on the Workflow Configuration's Workflow tab in Alteryx).*

Alteryx.getWorkflowConstant("Engine.TempFilePath")
► 'C:\Users\myuser\AppData\Local\Temp\Engine_17376_1afec4920f674e3a8c29be6225048da8_\'
Alteryx.getWorkflowConstants( )
The value of all Alteryx workflow constants will be returned in a dict.*

Alteryx.getWorkflowConstants()
► { 'Engine.GuiInteraction': '1',
'Engine.IterationNumber': 0,
'Engine.ModuleDirectory': 'C:\my_workflows\',
'Engine.TempFilePath': 'C:\Users\myuser\AppData\Local\Temp\Engine_17376_1afec4920f674e3a8c29be6225048da8_\',
'Engine.Version': '2018.3.0.9999',
'Engine.WorkflowDirectory': 'C:\my_workflows\',
'Engine.WorkflowFileName': 'my_workflow.yxmd'
}
Alteryx.importPythonModule( *"<path>"*, [list of submodules] )
A module object will be returned, representing the python file or directory of files specified. By default, if a directory is provided, all submodules will be imported, but a list of submodules can be specified using the optional* submodules argument.

myscript = Alteryx.importPythonModule("C:\\documents\\my_script.py")
myscript.square(3)
► 9
mypkg = Alteryx.importPythonModule("C:\\documents\\my_package")
mypkg.module1.half( mypkg.subpkg.max_value )
► 4

The above code snippets are based on the following files and folders

C:\documents  
C:\documents\my_script.py  
^^^ (contains a function square() that returns the input value squared)  
C:\documents\my_package  
C:\documents\my_package\__init__.py  
C:\documents\my_package\module1.py  
^^^ (contains a function half() that returns half of the input value)  
C:\documents\my_package\subpkg  
C:\documents\my_package\subpkg\__init__.py  
^^^ (contains a variable max_value = 8)

fpinchon · ‎02-16-2021

@NeilR Great! very helpful, thanks Neil!

AndrewCrayford · ‎02-17-2021

I trying to read in some text from a field in the workflow and then run a bit of pyhton code which i found online to convert base64 into utf8 text.

This is the code I am using

#################################
# List all non-standard packages to be imported by your
# script here (only missing packages will be installed)
from ayx import Package
#Package.installPackages(['pandas','numpy'])

#################################
from ayx import Alteryx

#################################
data = Alteryx.read("#1")

#################################
data

#################################
import base64

encodedStr = data

# Standard Base64 Decoding
decodedBytes = base64.b64decode(encodedStr)
decodedStr = str(decodedBytes, "utf-8")

Alteryx.write(decodedStr,4)

#################################

I am getting the following error message -

Error: Python (15): ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-b10eef4cdde5> in <module>
4
5 # Standard Base64 Decoding
----> 6 decodedBytes = base64.b64decode(encodedStr)
7 decodedStr = str(decodedBytes, "utf-8")
8
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\base64.py in b64decode(s, altchars, validate)
78 in the input result in a binascii.Error.
79 """
---> 80 s = _bytes_from_decode_data(s)
81 if altchars is not None:
82 altchars = _bytes_from_decode_data(altchars)
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\base64.py in _bytes_from_decode_data(s)
44 except TypeError:
45 raise TypeError("argument should be a bytes-like object or ASCII "
---> 46 "string, not %r" % s.__class__.__name__) from None
47
48
TypeError: argument should be a bytes-like object or ASCII string, not 'DataFrame'

I am not sure how to fix this and was wondering can help me to get this working or am I doing something wrong?

NeilR · ‎02-17-2021

@AndrewCrayford, the problem, as the last line of the error states, is that the data object you have created to represent the data coming into the Python tool is a Pandas DataFrame (table), not a string, which is what the Python code you found online is expecting. One way to translate your code to work on a DataFrame is as follows:

from ayx import Alteryx
data = Alteryx.read("#1")

import base64

# Standard Base64 Decoding
data['decodedBytes'] = data.Base64_encoded.apply(base64.b64decode)
data['decodedStr'] = data.decodedBytes.str.decode("utf-8")

Alteryx.write(data,1)

See it in action in a workflow (using the out of the box Base64 Encoder tool to create the Base64_encoded string field):

It will take more than this response to properly understand what's going on here. Some resources I'd recommend:

Also consider using these free Base64 Encoder/Decoder tools that you can install into Alteryx Designer.

AndrewCrayford · ‎02-17-2021

@NeilR

I will have a look at them links you have sent over.

I try using the base64 decoder from the Alteryx Gallery but having some issues when I decode any base64 with has special letters such as Russian text it display the text as Ð°Ð²Ñ‚Ð¾Ð¼Ð°Ñ‚Ð¸Ñ‡ÐµÑÐºÐ°Ñ. I think that is the character set is set to ASCII and not UTF8 and I not sure how to change it to UFT8?

NeilR · ‎02-17-2021

@AndrewCrayford you're right it doesn't look like @jdunkerley79's macros work for utf-8. In that case the Python code should do the trick.

AndrewCrayford · ‎02-17-2021

@NeilR is there a limit to the size of the base64 code which used in that Python code?

rgusaas · ‎03-26-2021

Greetings,

I have the below workflow code in the python tool and want to know what's missing to see data in the browse tool at the end? I have not seen any examples of passing the pandas dataframe back to Alteryx so other Alteryx tools can be used after the python tool. The below python code runs successfully and displays the content of the dataframe, but the browse tool is empty after running the workflow.

from ayx import Package
from ayx import Alteryx
import pandas as pd
import numpy as np

df = Alteryx.read("#1");

Alteryx.write(df, 1)

Input Tool:

Results:

Should I not be able to see the data from the dataframe in the Browse tool?

Inactive User · ‎12-28-2021

Python tool in alteryx is a great way to incorporate python code into the alteryx workflow. It gives us another distinct feature which one doesn't think of, when one starts making workflows using the most used tools i.e. the frequently used tools like the one we can encounter in "In/Out", "Preparation", "Join", "Parse" and "Transform".

Let's get into the details of python tool in alteryx

This is how python tool look like in alteryx. It requires one input (not mandatory to provide) and we have five output terminals, through which we can have a look at 5 different dataframes (it’s just table in python language)
We can use the python tool in two different modes
1) Interactive mode : When the workflow is run, Alteryx will print messages within the Jupyter notebook and refresh the data available to the notebook. This is helpful for developing code but less efficient. Use Interactive mode to view inline errors and charts.
2) Production mode : Used to optimize speed and efficiency when you don’t need to inline error messaging.
Input data
Input data is read as pandas dataframe (pandas is one of the various libraries that python has). The input data is read by python tool using the command
df = Alteryx.read("#1")
where df is the name of the dataframe, which you can chose by yourself.
Analysis
Perform your analysis on the pandas dataframe. Create new columns and you can delete redundant columns. You might have to install some libraries and modules depending upon the nature of the analysis, you’re going to perform. For example, using the code below, I’m installing the Python library which allows you to encode and decode JSON Web Tokens (JWT)
pip install pyjwt
Features
You can also directly import a script (jupyter notebook file i.e. .ipynb file). Jupyter Notebook (formerly IPython Notebooks) is a web-based interactive computational environment for creating notebook documents.
There are some tools present in the alteryx which don’t mandatorily need an input to provide the output and python tool is one of them.

You can get more information about Jupyter notebook here
https://jupyter.org/
Output
The output can be written to any of the five output terminals that python tool provides. The command to write the output in the third output terminal is as shown
Alteryx.write(df, 3)
Where df is the name of the dataframe.
This is basically everything you need to know to get going with the python tool. Ofcourse there are more advanced features which you’ll eventually come across, when you start working with it.
One of the mantra to get familiarize with the new tools is to go through the example given in the alteryx. On the right hand side upper corner, you can search for the tool and click on example for the tool you want to learn more about and go through it. This is a great way to see different use cases of a tool and then you can make a strategy for how you can use the tool for your needs.
example example
Thanks for reading

Tool Mastery

Become a Tool Master

Tool Mastery | Python

Code snippets for passing data between Alteryx and Jupyter