We have extended our Early Bird Tickets for Inspire 2023! Discounted pricing goes until February 24th. Save your spot!

alteryx Community

Regularization in Alteryx

Alteryx Alumni (Retired)
Created

A common concern in predictive modeling is whether a model has been overfit. In statistics, overfitting refers to the phenomena when an analytical model corresponds too closely (or exactly) to a specific data set, and therefore may fail when applied to additional data or future observations. One common method that can be used to mitigate overfitting is regularization. In Alteryx, the option of implementing regularized regression is available for the Linear Regression and Logistic RegressionTools (under Customize Model).

Three of the most common regularization models for regressions are Ridge Regression, Lasso, and Elastic Net. Ridge Regression, also referred to as Tikhonov regularization or weight decay, applies a penalty term to the coefficients of the regression being built. LASSO (Least Absolute Shrinkage Selector Operator) performs both variables selection and regularization by increasing the penalty, which sets more coefficients to zero (effectively performing variable selection). Elastic Net is a method that linearly combines the penalties of the LASSO and Ridge Regression methods. Like LASSO, Elastic Net will perform variable selection and create reduced models by setting coefficients equal to zero.

Configuring Regularization in Alteryx:

Users can customize regularized regression by changing the values of alpha and lambda (in the model customization tab). The alpha parameter controls the distribution between Lasso penalties and Ridge Regression penalties. By default it is set to 0.5, for Elastic Net. The lambda parameter sets the strength of the regularization penalty applied. Setting lambda to 0 applies no regularization and the alpha value is ignored.

The following chart is a handy reference for how each of the values work:

Choosing between Ridge Regression, Elastic Net, and Lasso:

Ridge Regression will keep all predictor variables in the model, and will shrink their coefficients proportionally. When correlated predictor variables exist in the model, Ridge Regression shrinks the coefficients of the entire group of correlated variables towards equaling one another. If you do not want correlated predictor variables removed from your model, use Ridge Regression.

LASSO will act as a variable selector by selecting a small subset of variables with a value of lambda high enough to be considered crucial. However, LASSO may not perform well when there are correlated predictor variables, as it will select one variable of the correlated group and remove all others. Another limitation of LASSO occurs when there is high dimensionality in the model. When a model contains more variables than records, LASSO is limited in how many variables it can select, which is not true for ridge regression. When the number of variables included in the model is large, or if the solution is known to be sparse, LASSO is recommended.

Elastic net will act as a variable selector while also preserving the grouping effect seen in Ridge Regression for correlated variables (shrinking coefficients of correlated variables simultaneously). Elastic Net is not limited by high dimensionality and can evaluate all variables when a model contains more variables than records, as opposed to LASSO.

In summary: LASSO induces sparsity; Ridge Regression provides stability and encourages grouping; and Elastic Net attempts to balance both traits.

NOTE: The underlying R package that is used by the Linear and Logistic Regression Tools for regularized regression is Glmnet