The 7.1 release of Alteryx contains 17 new macros that enable data artisans to easily undertake predictive analytics projects. The tools can be placed into several different groups based on the business analytics functions they perform. The groups are predictive modeling, model assessment, grouping, data investigation, and data sampling for predictive analytics. This post is the first in a 5-part series and will focus on the predictive modeling macros.
Predictive Modeling
There are two basic types of predictive models, those that predict categorical fields (such as buy / don't buy, brand choice, whether a customer renewed their service contract, whether a customer is likely to default on a loan, the customer segment to which a customer is most likely to be a member, and other business analytics) and those that predict continuous numeric fields (such as revenue per customer, total monthly unit sales, the number of annual customer visits to each retail outlet, etc.).
In the case of categorical fields, what is predicted is the probability that the unit under study (a customer, an account, an outlet) will fall into each of several possible categories of interest. To give a concrete example, consider a firm that is undertaking a direct marketing campaign targeted to existing customers. A factor of key interest is the probability that a customer that is contacted will favorably respond to the campaign based on predictor fields such as the number of days since the customer's last purchase, the frequency of customer purchases over the last year, and the monetary value of those purchases over the same time period. The Alteryx R-based predictive analytics macros in 7.1 that allow a data artisan to develop the appropriate models to predict categorical field are:
Logistic Regression
Decision Tree
Forest Model
Why are there three different predictive macros to undertake the same predictive analytics task? It turns out that no one type of model (commonly referred to as an "algorithm") consistently does better at predicting a field of interest across applications. In some projects a logistic regression model will outperform a forest model, but in other projects the reverse will be true. As a result, multiple tools are provided, along with macros that compare models developed using different algorithms, to allow a data artisan to create the most effective predictive model possible.
The categorical predictive models are used in a large number of industry verticals for a number of different purposes. Common business analytics examples include
The predictive analytics macros that enable a data artisan to develop model to predict a continuous numeric field in Alteryx 7.1 are:
Logistic Regression
Decision Tree
Forest Model
As with the categorical modeling tools, the continuous field modeling tools are applicable to a wide range of verticals and applications. Some Common applications of these methods include:
In my next post, I will focus on the model assessment macros which allow the data artisan to fine-tune a model and determine the most effective model for predicting new data.
Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.
Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.