Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Data Science

Machine learning & data science for beginners and experts alike.
DrDan
Alteryx Alumni (Retired)

Prescriptive_Icons.png

At the time I'm writing this, we are focused on putting the finishing touches on the 10.6 release (Now available here). Many of the new capabilities that are being introduced with this release are focused on advanced analytics. We are particularly excited about the introduction of four new tools that are focused on prescriptive analytics.

 

To give some background, analytics is often described as falling into three types: *Descriptive* analytics which provides information that can answer the question "What has happened?"; *predictive* analytics which provides information that can answer the question "What can we expect to happen?"; and *prescriptive* analytics which provides information that can answer the question "What should we do?". Most of Alteryx's core functionality is designed to address descriptive analytics use cases, while our existing set of R-based advanced analytics tools are focused on performing predictive analytics.

 

Alteryx has had some prescriptive analytics capabilities for a while, through its location optimizer feature, but there is a need to provide more general tools in this area. Two of the most commonly used methods for prescriptive analytics are optimization and simulation, and the upcoming 10.6 features tools that address both these methods.

 

Optimization

 

Optimization_Config.pngMathematical programming models are central to optimization. In most business applications these methods are used to optimize an objective (such as minimizing cost or maximizing profits) subject to a set of constraints. To give a concrete example, consider a firm in the food service industry that wants to minimize the cost of the cooking oil blend it uses for frying food, but wants to maintain a set of quality standards (the constraints) such as fat composition (not having too much saturated fat), the temperature at which the oil begins to smoke, and flavor characteristics (such as setting a maximum allowed amount of highly flavored oils). In this case, the mathematical programming model will find the lowest cost blend of different cooking oils (such as corn oil, soybean oil, peanut oil, etc.) that meets the set of quality constraints.

 

Models of this type are used in many industries and use cases, such as the food service cooking oil example, determining what collection of items (both in terms of categories and individual items) that should be placed on a retail store's shelves to maximize profits, and which stocks should be placed in an institutional investor's portfolio that minimizes risk, subject to achieving a desired rate of return. These use cases can be addressed using different types of mathematical programming models. The cooking oil example can be addressed using a method known as linear programming, the retail shelf space allocation example requires a method commonly called mixed integer programming, while the selection of stocks for a portfolio is best addressed using a quadratic programming model.

 

All three of the mathematical programming methods listed above are available in Alteryx's new Optimization tool. The underlying software that does the heavy lifting for mathematical programming models are known as solvers. There are both open and closed source solvers available, which drastically differ from one another in their ability to address different problems, in the amount of compute time it takes to find an optimal solution, and in their license cost. We have made an effort to be as agnostic as possible in terms of the set of solvers Alteryx can work with. For the 10.6 release, Alteryx will work with open source solvers (such as the COIN-OR project's Symphony solver, the GNU linear programming kit, and the R quadprog package), but, going forward, we will also enable the use of leading proprietary solvers, such as Gurobi and CPLEX.

 

We have also tried to be flexible in how mathematical models can be input into the new Optimization tool. There are three different ways it can be done: (1) by manually entering the objective function and constraints of the model using a set of "smart" interface tools that provide variable suggestions as the user enters the objective and constraints; (2) by providing a set of Alteryx data streams (tables) that specify the model objective and constraints in a manner that will be familiar to users of the Solver plugin for Microsoft Excel; and (3) by providing an algebraic formulation of the problem using CPLEX LP, MathProg, or MPS formats.

 

Simulation

SimSampling_Config.pngThe term *simulation* actually covers a number of specific methods that are in some ways related, but in important ways, distinct from one another. The three most commonly used methods are Monte Carlo simulation, agent based models, and discrete event simulation. We focused on Monte Carlo simulation for Alteryx 10.6. However, in the longer term, we plan on addressing all three of these simulation methods.

 

Monte Carlo simulation is used to help make decisions in situations where there are multiple sources of uncertainty. For example, consider an electric utility that needs to decide whether to bring more power plants online, take some offline, or buy power on a day-ahead spot market. To make these decisions, the utility needs to know how much electricity will be demanded by customers in each hour of the following day so that enough power will be available on the grid. Given that space heating and cooling is one of the major uses of electricity, one of the most critical factors driving electricity demand is the temperature in each hour. Unfortunately, what the utility knows about this issue is the forecast high and low temperature tomorrow. In this example there are four sources of uncertainty:  (1) The difference between the forecast and actual high temperature tomorrow; (2) The difference between the forecast and actual low temperature; (3) The forecast of hourly temperatures given tomorrow's forecast high and low temperatures and other factors; and (4) The forecast of electricity demand given hourly temperatures and other factors. Monte Carlo simulation allows for the assessment of the cumulative impact of all four sources of this uncertainty, so the utility can determine a safe level of power to have in the grid at anytime the following day, allowing them to make appropriate decisions regarding both the production and purchase of power.

 

Alteryx's Monte Carlo simulation capabilities are delivered through three different tools, Simulation Sampling, Simulation Score, and Simulation Summary. The Simulation Sampling tool allows random draws for a variable of interest (e.g., the high temperature for the day given the forecast high and a measure of the variance around the forecast). In the tool, a user can sample from existing data for a variable directly, estimate the best fitting parametric distribution for the variable from existing data, specify a parametric distribution for the variable, or "draw" the distribution of the variable (which will then be fit to find an appropriate parametric distribution).

 

The Simulation Score tool allows for the inclusion of one or more predictive models within a summary. This tool not only provides predicted values given uncertain predictor variables, it also provides measures associated with the uncertainty surrounding a model's predictions (also known as prediction intervals).

 

The Simulation Summary tool summarizes all the different sources of uncertainty, and examines the relationships between the uncertain variables. The output of this tool provides both an interactive dashboard and a static report of the simulation results.

 

Other New Features

 

In addition to Optimization and Simulation, this release adds the ability to use In-DB predictive tools in the Teradata platform via Microsoft R Server for Teradata. This matches our current Oracle capabilities, allowing for the use of the Linear Regression, Logistic Regression, and Score tools in an In-DB context on the Teradata platform. Going forward, we will expand the number of tools covered, as well as the number of platforms we support, in many cases leveraging Microsoft R Server.

 

 

TS_Factory_Icons.pngAssociated with this release, we will be placing a second wave of items into the Predictive District of the Alteryx Analytics Gallery. These items include an Analytic App for installing R packages in a way they can readily be used in Alteryx (Install R Packages), the TS Model Factory and TS Forecast Factory tools for simultaneously creating a set of times series forecasting models and forecasts (say for every product sold by a manufacturer) with a minimal number of tools, and a set of three starter kits (AB Testing, Linear Regression, and Logistic Regression) to help users new to predictive analytics become productive quickly.

 

All in all we think we have moved the ball forward in terms of Alteryx's advanced analytics capabilities, but there is still more work to do.

 

 

Alteryx 10.6 is now available for download here.

Dan Putler
Chief Scientist

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.

Comments