Welcome to part 2 of our discussion of metrics for evaluating your machine learning models! If you missed part 1 and would like a general intro to the meaning of “metrics,” plus a deep dive into the metrics often used for classification models, be sure to check out that post.
In this post, we’ll refresh our knowledge of regression before reviewing some options for metrics used with regression models. Whether you’re feeling determined, absolute or maximum, there’s a metric for your particular regression needs.
As mentioned in part 1, regression models use a different set of metrics from classification models because what you’re trying to predict is a numeric value, not one of a predetermined set of outcomes or classes. Instead of predicting something like “high” or “low,” you’re seeking to predict “1,390,000” or “15.” (Here’s a refresher on linear regression for the basics of that approach. Random forest and decision tree models can also be used for regression, among other options.)
Image via GIPHY
With regression, we build models to make predictions that are as close as possible to the observed outcome values we have on hand in our dataset — for example, the selling prices for houses. Our model suggests a specific mathematical relationship between the predictor variables we choose and the observed outcome values.
When we compare our model’s predicted house prices to the observed house prices, we are calculating what’s called residuals, the differences between predictions and the observed values. The term “error” is also often used to refer to those differences. (There is a statistical distinction between “residual” and “error,” but to avoid the semantic weeds, I’ll refer you to these explanations; you’ll see both terms used in this post.)
For example, if we’re trying to predict houses’ selling prices, and we have a dataset showing that a house sold for $200,000 but our model predicts a sales price of $180,000, the residual is $20,000. Our model will be more useful if we can reduce that difference between the observed and predicted prices — and not just for this house, but across all the houses in our dataset.
We have different options for summarizing a model’s error across all its predictions. That’s where things can get a little fuzzy, because there are a lot of different approaches. As in our discussion of classification metrics, you have to decide which strategy works best for your particular situation: your data, your preferred kind of prediction error, and your need for explainability.
Let’s walk through the different options for metrics you can use in evaluating your regression model. (There are still more, but we’ll just look at the choices offered for the optional customization of the objective function in the AutoML tool, as in part 1 of this post.)
Image via GIPHY
First we’ll check out some different ways of looking at your model’s blunders — specifically, the gaps between its predictions and our observed values. Let’s make up a dataset for the cost of holiday gifts. We’ll assume there were some other features used for prediction here, but to keep it simple, let’s just look at the actual cost, your model’s predicted cost, and the error.
Gift ID |
Actual |
Predicted |
Error |
Squared error |
1 |
$120 |
$150 |
$30 |
900 |
2 |
$200 |
$140 |
-$60 |
3600 |
3 |
$210 |
$190 |
-$20 |
400 |
Mean absolute error: $36.67 |
Mean squared error: 1633.33 |
Definition: the average (or median) of the absolute value of the error for all your model’s predictions. Values range from 0 to infinity, and lower values reflect a better-performing model. The mean absolute error for our tiny dataset above is $37; the median absolute error is $30.
Important to know:
Image via GIPHY
Definition: the squared values of all the errors, averaged. Values range from 0 to infinity, and lower values reflect a better-performing model. The MSE for our tiny dataset is 1633.33.
Important to know:
Gift ID |
Actual |
Predicted |
Error |
Squared error |
1 |
$120 |
$150 |
$30 |
900 |
2 |
$200 |
$160 |
-$40 |
1600 |
3 |
$210 |
$170 |
-$40 |
1600 |
Mean absolute error: $36.67 |
Mean squared error: 1366.67 |
Definition: this is your worst-case-scenario metric. Out of all the predictions your model makes, what is the absolute value of its largest mistake? If your model is perfectly fitted (which is unlikely), the maximum residual error will be 0 because every prediction matched the true outcome value. Values begin at 0 and can range up to the largest absolute value of the outcome variable in your data.
Important to know:
Image via GIPHY
Our last two metrics assess how well your model and its chosen set of predictors can account for the variation in the outcome variable’s values.
Definition: Represents the proportion of the variance in the outcome variable that the model and its predictor variables are accounting for. Values typically range from 0 to 1, with higher values showing that the model is a better fit.
Important to know:
Definition: This is very similar to R2, representing the proportion of variance in the outcome variable explained by your model — but there’s a twist. Explained variance also incorporates the mean error in its calculation, which will account for skew in your model’s residuals (i.e., whether its results tend to bias in a consistent way).
Important to know:
Image via GIPHY
You’re now equipped to review metrics for both classification and regression problems! The choice of which metrics to evaluate isn’t easy, but it’s cool to be able to decide what you want to prioritize for the unique goals and applications of your model.
Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek
Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.