Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Stepwise Regression by iteration with VIF

cyeager
7 - Meteor

I'm looking to recreate regression analysis, similar to the process I can do with SPSS.  I'm looking to run a Stepwise Regression analysis that will analyze all the independent variables and arrange them in order of most important one step at a time.  So the model would start with 1 variable predicting the Dependent variable, and would indicate a VIF number (very low)... The next output would have the best two variables that aren't don't have high multi-collinearity.  The VIF number increases as more variables and added multi-collinearity occurs.  So far, I feel like the Alteryx Stepwise node doesn't kick-out properly enough variables, or allow us to chose the best iteration, (best model with 1 IV, 2 IV, 3 IV, 4 IV, 5 IV... instead if you start with 80 IVs, you might end up with 50 IVs, :/.

 

I'm able to run Stepwise, but it ends up with only a one model option that has way too many variables and high multi-collinearity.  Ideally, the best model will likely have 3-9 predictor variables.  With any more IVs then the VIF level would likely be too high for my predictive preference.  My ultimate goal is to automate the model to stop adding variables if VIF is over a threshold, like 8.  

 

Thank you for adding more insight into work-arounds, and your thoughts.

 

*Also interestingly, while "Variance Inflation Factors" node isn't a node option for me, I was able to copy and paste the node through someone on the Community's post.  Also, some of my predictive Nodes are only available as an in-database node (blue input instead of green input), but I can find and copy and paste the node input that I need through copying and pasting into the workflow.

1 REPLY 1
cyeager
7 - Meteor

One way to reduce the amount of IVs that Stepwise wants to keep is to first run a 'Pearson Correlation', and filter out low ABS(x) correlation scores from the results.  While this improved the model I was working on, but doesn't solve other aspects that would be interesting to be able to do. 

Labels