# Data Science Blog

Machine learning & data science for beginners and experts alike.
Andy Uttley, Alteryx ACE, makes music with Alteryx | Math + Music

## When and Why You Would Use the New Alteryx R-Based Predictive Analytics Macros, Part 1: Predictive Modeling

Alteryx

The 7.1 release of Alteryx contains 17 new macros that enable data artisans to easily undertake predictive analytics projects. The tools can be placed into several different groups based on the business analytics functions they perform. The groups are predictive modeling, model assessment, grouping, data investigation, and data sampling for predictive analytics. This post is the first in a 5-part series and will focus on the predictive modeling macros.

Predictive Modeling

There are two basic types of predictive models, those that predict categorical fields (such as buy / don't buy, brand choice, whether a customer renewed their service contract, whether a customer is likely to default on a loan, the customer segment to which a customer is most likely to be a member, and other business analytics) and those that predict continuous numeric fields (such as revenue per customer, total monthly unit sales, the number of annual customer visits to each retail outlet, etc.).

In the case of categorical fields, what is predicted is the probability that the unit under study (a customer, an account, an outlet) will fall into each of several possible categories of interest. To give a concrete example, consider a firm that is undertaking a direct marketing campaign targeted to existing customers. A factor of key interest is the probability that a customer that is contacted will favorably respond to the campaign based on predictor fields such as the number of days since the customer's last purchase, the frequency of customer purchases over the last year, and the monetary value of those purchases over the same time period. The Alteryx R-based predictive analytics macros in 7.1 that allow a data artisan to develop the appropriate models to predict categorical field are:

Logistic Regression

Decision Tree

Forest Model

Why are there three different predictive macros to undertake the same predictive analytics task? It turns out that no one type of model (commonly referred to as an "algorithm") consistently does better at predicting a field of interest across applications. In some projects a logistic regression model will outperform a forest model, but in other projects the reverse will be true. As a result, multiple tools are provided, along with macros that compare models developed using different algorithms, to allow a data artisan to create the most effective predictive model possible.

The categorical predictive models are used in a large number of industry verticals for a number of different purposes. Common business analytics examples include

• A retailer using these methods to determine who to target in a direct marketing campaign designed to prospect for new customers
• A mobile wireless telephone provider using these methods to determine which of their existing customers are at risk of not renewing when their current contract expires so that appropriate marketing actions can be taken to encourage that customer to renew their contract
• Financial services provider using these methods to assess the probability that a loan applicant would default on that loan in order to determine whether to extend the loan

The predictive analytics macros that enable a data artisan to develop model to predict a continuous numeric field in Alteryx 7.1 are:

Logistic Regression

Decision Tree

Forest Model

As with the categorical modeling tools, the continuous field modeling tools are applicable to a wide range of verticals and applications. Some Common applications of these methods include:

• Predicting the length of time until a critical piece of equipment will have a usage rate that is over capacity in the local radio access network of a cellular telephone provider
• Obtaining estimates of the impact of an outdoor media marketing campaign on its online banking services in terms of incremental service usage for a financial services provider
• Forecasting category level sales by store in order to implement an inventory replenishment system for a retailer

In my next post, I will focus on the model assessment macros which allow the data artisan to fine-tune a model and determine the most effective model for predicting new data.

Dan Putler
Chief Scientist

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.