This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
If you are building a predictive model, inevitably you will want to analyze the effect that your independent variables have on your dependent variable. This article is meant to shed some light on the Alteryx-specific options for this type of analysis!
Sampling weights, also known as survey weights, are positive values associated with the observations (rows) in your dataset (sample), used to ensure that metrics derived from a data set are representative of the population (the set of observations).
With the Python Tool, Alteryx can manipulate your data using everyone’s favorite programming language - Python! Included with the tool are a few of pre-built libraries that extend past even the native Python download. This allows you to extend your data manipulation even further than one could ever imagine. The libraries installed are listed here - and below I’ll go into a bit more detail on what and why these libraries are so useful.
Each library is well documented, and there’s usually an introduction or examples on their sites to get you started on how a basic function in their library works.
ayx – Alteryx API – simply enough, we’re using Alteryx, sooo yea, kind of a requirement for the translation between Alteryx and Python.
jupyter – Jupyter metapackage – If you’ve used a Jupyter notebook in the past, you’ll notice the interface for the Python Tool is similar. This interface allows you to run sections of code outside of actually running the workflow, which makes understanding and testing your data that much easier.
matplotlib – Python plotting package – Any charting, plotting, or graphical needs you would want will be in this package. This provides a great deal of flexibility for whatever you want to visualize.
numPy – NumPy, array processing for numbers, strings, records, and objects – Native Python processes data in what some would call a cumbersome way. For instance, if you wanted to make a matrix, a.k.a. a 4x4 table, you would need to create a list within a list, which can slow processing a bit. However, NumPy has its own “array” type that fits the data in this matrix pattern that allows for faster processing. Additionally, it has a bunch of methods of handling numbers, strings, and objects that make processing a whole lot easier and a whole lot faster.
pandas – Powerful data structures for data analysis, time series, and statistics – This is your staple for handling data within Alteryx. Those who have used Python, but never pandas, will enter a whole new beautiful world of data handling and structure. Data manipulation within Python is faster, cleaner, and easier to code with. The best part about it is that the Python Tool will read in your Alteryx data as a pandas data frame! Understanding this library should be one of the first things to know when tackling the Python code.
requests – Python HTTP for Humans – for all the connector/Download Tool fans out there. If any of you are familiar with making HTTP requests (API calls and the like), then you should introduce yourselves to this package and explore how Python performs these requests.
scikit-learn – a set of Python modules for machine learning and data mining – Welcome to the world of machine learning in Python! This library is your go-to for statistical and predictive modeling and evaluation. Any crazy and wild methods you’ve learned for machine learning will most likely be found here and can really push the boundaries of data science.
scipy – Scientific Library for Python – all your scientific and technical computing can be found here. This library builds off the packages already installed here, like numPy, pandas, and matplotlib. Dealing with mathematical models and formulae are usually located within this library and can help provide that higher level analysis of your data.
six – Python 2 and 3 compatibility utilities – For those who are unfamiliar, Python versions come in 2 forms, version 2.x and 3.x (with 3.x being the most recent). Now, even though Python 3 is supposed to be the latest and greatest, there are still many users out there who prefer using Python 2. Therefore, integration between the two is a bit tricky with syntax differences, etc. The six module provides functions that are usable between the two so everyone can remain calm and happy! Their documentation is usually coupled with which version the functions most closely align to, so a user can get a better idea to its functionality.
SQLAlchemy – Database Abstraction Library – SQL in Python! Covers all your database needs from connecting to and extracting data, allowing it to interact with your Python code and thus, Alteryx itself.
statsmodels – statistical computations and models for Python – This library builds off sci-kit learn but focuses more on statistical tests and data exploration. Additionally, it utilizes R-style formulae with pandas data frames to fit models!
These are the libraries installed with the Python Tool, which can do almost any data function imaginable. Of course, if you’re looking to do something that these libraries don’t provide, there are myriad other Python libraries that I’m sure will help you with your use case. Most of these are also well documented in how to use so search away and let your mind float away in the beautiful cosmos created by Python.
Alteryx has a full set of integrated predictive tools but even with developers working at full speed , it is hard to keep up with the R community. Sometimes users want to install and utilize their favorite R packages. This post demonstrates how to use and install additional R packages.
Regression analysis is widely used for prediction and forecasting. Alteryx customers use these statistical tools to understand risk, fraud, customer retention and pricing, among many other business needs.
Time series forecasting is using a model to predict future values based on previously observed values. In a time series forecast, the prediction is based on history and we are assuming the future will resemble the past. We project current trends using existing data.
Neural Networks are frequently referred to as "black box" predictive models. This is because the actual inner workings of why a Neural Network sorts data the way it does are not explicitly available for interpretation. A wide variety of work has been conducted to make Neural Networks more transparent, ranging from visualization methods to developing a Neural Network model that can “show it’s work”. This article demonstrates how to leverage the NeuralNetTools R package to create a plot of the Neural Network trained by the Alteryx Neural Net tool.
Typically the first step of Cluster Analysis in Alteryx Designer, the K-Centroids Diagnostics Tool assists you to in determining an appropriate number of clusters to specify for a clustering solution in the K-Centroids Cluster Analysis Tool, given your data and specified clustering algorithm. Cluster analysis is an unsupervised learning algorithm, which means that there are no provided labels or targets for the algorithm to base its solution on. In some cases, you may know how many groups your data ought to be split into, but when this is not the case, you can use this tool to guide the number of target clusters your data most naturally divides into.
Clustering analysis has a wide variety of use cases, including harnessing spatial data for grouping stores by location, performing customer segmentation or even insurance fraud detection. Clustering analysis groups individual observations in a way that each group (cluster) contains data that are more similar to one another than the data in other groups. Included with the Predictive Tools installation, the K-Centroids Cluster Analysis Tool allows you to perform cluster analysis on a data set with the option of using three different algorithms; K-Means , K-Medians , and Neural Gas . In this Tool Mastery, we will go through the configuration and outputs of the tool.
The Alteryx Forest Tool implements a random forest model using functions in the randomForest R package. Random forest models are an ensemble learning method that leverages the individual predictive power of decision trees into a more robust model by creating a large number of decision trees (i.e., a "forest") and combining all of the individual estimates of the trees into a single model estimate. In this Tool Mastery, we will be reviewing the configuration of the Forest Model Tool, as well as its outputs.
In statistics, standardization (sometimes called data normalization or feature scaling) refers to the process of rescaling the values of the variables in your data set so they share a common scale. Often performed as a pre-processing step, particularly for cluster analysis, standardization may be important to getting the best result in your analysis depending on your data.
The Neural Network Tool in Alteryx implements functions from the nnet package in R to generate a type of neural networks called multilayer perceptrons. By definition, neural network models generated by this tool are feed-forward (meaning data only flows in one direction through the network) and include a single Hidden Layer. In this Tool Mastery, we will review the configuration of the tool, as well as what is included in the Object and Report outputs.
The Field Summary Tool analyzes data and creates a summary report containing descriptive statistics of data in selected columns. It’s a great tool to use when you want to make sure your data is structured correctly before using any further analysis, most notably with the suite of models that can be generated with the Predictive Tools.
R is an open-source programming language and software environment, specifically intended for statistical computing and graphics. The Alteryx Predictive Tools install includes an installation of R, along with a set of R Packages used by the Predictive Tools. This article describes how to determine which R packages (and versions) are installed for used with your Alteryx R Tool, as well as a few Alteryx-specific packages on Github.
The Append Cluster Tool is effectively a Score Tool for the K-Centroids Cluster Analysis Tool. It takes the O anchor output (the model object) of the K-Centroids Cluster Analysis Tool, and a data stream (either the same data used to create the clusters, or a different data set with the same fields), and appends a cluster label to each incoming record. This Tool Mastery reviews its use.
As most of us can agree, predictive models can be extremely useful. Predictive models can help companies allocate their limited marketing budget on the most profitable group of customers, help non-profit organizations to find the most willing donors to donate to their cause, or even determine the probability a student will be admitted into a given school. A well-designed predictive model can help us make smart and cost-effective business decisions.