on 05-09-2018 09:36 AM - edited on 03-08-2019 12:07 PM by Community_Admin
This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Logistic Regression Tool on our way to mastering the Alteryx Designer:
As many of you know, Logistic regression can be applied to a variable with two possible outcomes. It creates a model that relates a binary target variable (such as a 0/1 or Yes/No) to one or more predictor variables and obtains the estimated probability for each of the two possible outcomes. Logistic Regression is different from other types of regression because it creates predictions within a range of 0-1 and it does not assume that the predictor variables have a constant marginal effect on the target variable, making it appropriate for dichotomous target variables (one that has only two possible values).
Logistic regression can be applied to many problems, including: estimating the probability that a student will graduate, the probability that a voter will vote for a specific candidate, or the probability that someone will respond to a marketing campaign.
Aa a side note, before building a statistical model, you should always perform a thorough analysis of your data, and the Data Investigation Tool Series provides a great resource to begin your pre-model analysis. Once a pre-model analysis has been performed, you can begin building your model and can focus on the details of the Logistic Regression Tool.
We will walk through the configuration of the Logistic Regression Tool using the sample workflow that can be found within Designer:
Help > Sample Workflows > Predictive tool samples > Predictive Analytics > 9 Logistic Regression
Setup Screen
Customization Screen
The customization screen can be reached from the setup screen.
A. Model Screen
Note: The Model screen has three customizable options for building a logistic model (Use sampling weights, Use regularized regression, and Select model type).
I. Enter value of alpha
II. Standardize predictor variables
III. Use cross-validation to determine model parameters
IV. Value of Lambda used for predictions
B. Cross-validation Screen
The Cross-Validation tab allows you to perform cross-validation in your model. The cross-validation option can be enabled by clicking on the slider in the top-right of the configuration screen. You can learn more about Cross-validation from this article.
C. Plots Screen
The plots screen allows the user to choose Graph resolution. There are 3 options:
Now you know how to configure the tool, let’s look at interpreting the results. The tool has 3 outputs:
O (Output): Displays the model name and size – this outputs the actual model object. You can connect this output to the Score Tool, which generates a predicted value (score) from a separate data stream.
R (Report): Generates a report with the fit statistics for the model. In general, there are 3 major fit statistics that are generated with this output.
I (Interactive): Displays the dashboard with various statistics about the data such as actual positive and actual negative.
That’s it folks, now you should be an expert on the Logistic Regression Tool!
Alteryx has a wide variety of kits where you can learn various subjects. The Predictive Starter Kit provides step-by-step tutorials that will teach you how to build core predictive insights, from developing a proper dataset to analyzing the results without requiring niche tools or R-coding.
By now, you should have expert-level proficiency with the Logistic Regression Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at community@alteryx.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.
Stay tuned with our latest posts every Tool Tuesday by following Alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.
How do I select the value of the target variable? I have 0 and 1 for the target, is the logistic regression tool trying to reach 0 or 1?
How does this tool handle different data types?
For example, in the included workflow the data has strings as well as continuous variables (age). When using logistic regression previous I am sure I had to mess around with the data a lot. In Alteryx, is there some sort of background use of dummy variables or something?