Data Science

DrDan · ‎08-21-2012

The 7.1 release of Alteryx contains 17 new macros that enable data artisans to easily undertake predictive analytics projects. The tools can be placed into several different groups based on the business analytics functions they perform. The groups are predictive modeling, model assessment, grouping, data investigation, and data sampling for predictive analytics. This post is the first in a 5-part series and will focus on the predictive modeling macros.

Predictive Modeling

There are two basic types of predictive models, those that predict categorical fields (such as buy / don't buy, brand choice, whether a customer renewed their service contract, whether a customer is likely to default on a loan, the customer segment to which a customer is most likely to be a member, and other business analytics) and those that predict continuous numeric fields (such as revenue per customer, total monthly unit sales, the number of annual customer visits to each retail outlet, etc.).

In the case of categorical fields, what is predicted is the probability that the unit under study (a customer, an account, an outlet) will fall into each of several possible categories of interest. To give a concrete example, consider a firm that is undertaking a direct marketing campaign targeted to existing customers. A factor of key interest is the probability that a customer that is contacted will favorably respond to the campaign based on predictor fields such as the number of days since the customer's last purchase, the frequency of customer purchases over the last year, and the monetary value of those purchases over the same time period. The Alteryx R-based predictive analytics macros in 7.1 that allow a data artisan to develop the appropriate models to predict categorical field are:

Logistic Regression

Decision Tree

Forest Model

Why are there three different predictive macros to undertake the same predictive analytics task? It turns out that no one type of model (commonly referred to as an "algorithm") consistently does better at predicting a field of interest across applications. In some projects a logistic regression model will outperform a forest model, but in other projects the reverse will be true. As a result, multiple tools are provided, along with macros that compare models developed using different algorithms, to allow a data artisan to create the most effective predictive model possible.

The categorical predictive models are used in a large number of industry verticals for a number of different purposes. Common business analytics examples include

A retailer using these methods to determine who to target in a direct marketing campaign designed to prospect for new customers
A mobile wireless telephone provider using these methods to determine which of their existing customers are at risk of not renewing when their current contract expires so that appropriate marketing actions can be taken to encourage that customer to renew their contract
Financial services provider using these methods to assess the probability that a loan applicant would default on that loan in order to determine whether to extend the loan

The predictive analytics macros that enable a data artisan to develop model to predict a continuous numeric field in Alteryx 7.1 are:

Logistic Regression

Decision Tree

Forest Model

As with the categorical modeling tools, the continuous field modeling tools are applicable to a wide range of verticals and applications. Some Common applications of these methods include:

Predicting the length of time until a critical piece of equipment will have a usage rate that is over capacity in the local radio access network of a cellular telephone provider
Obtaining estimates of the impact of an outdoor media marketing campaign on its online banking services in terms of incremental service usage for a financial services provider
Forecasting category level sales by store in order to implement an inventory replenishment system for a retailer

In my next post, I will focus on the model assessment macros which allow the data artisan to fine-tune a model and determine the most effective model for predicting new data.

Data Science

When and Why You Would Use the New Alteryx R-Based Predictive Analytics Macros, Part 1: Predictive Modeling