Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Stepwise Logistic Regression

Chris_J
5 - Atom

Hi, 

I am trying to run a stepwise logistic regression on 40,000 records and 100 variables. I am having performance challenges on my desktop. I've tried using XDF with Microsoft R Client but see very similar performance. If I am lucky it finishes in about 16 hours. But in some instances the model complete but then it does not run the Model Comparison node (downloaded from the Predictive District). This can get rather frustrating when having to rerun the model (another 16 hours) hoping to get the model fit and error measures.

 

I have a couple of questions specifically around the stepwise regression:

 

1. Does anyone else experience the same performance challenges and is there a way around it?

2. Am I just throwing just too many variables at the stepwise regression?

3. Are there other alternatives for selecting the appropriate variables for my logistic regression mode?

 

Thank you

Chris

3 REPLIES 3
chadanaber
7 - Meteor

Not sure if the dimensionality of your data set is causing the performance issues, but if that is the case, you may want to look into a principal components analysis of your data set.  Hopefully this will help you determine which variables to include in your stepwise analysis.  It's a tool in the predictive grouping set.  Hope this helps.  :)

Chris_J
5 - Atom

Thank you for the suggestion.

 

Still interested in finding out from others whether they have experienced similar run times with stepwise regression. Or is stepwise regression not that widely used?

PeterGoldey
11 - Bolide

Hi - yes, generally this will be a fairly slow method.  I haven't used it in the Alteryx toolset directly but it suffers from exponential increase in complexity for each predictor variable.  So the advise to try PCA first is pretty good.

 

Stepwise also can lead to pretty faulty results and overfitting unless you've done a great job of controlling everything and have good domain expertise to eliminate bleed, etc.

 

If you aren't very familiar with some of the drawbacks, check this out to start:  https://en.wikipedia.org/wiki/Stepwise_regression

 

Personally, I prefer to use domain expertise or PCA to reduce the number of variables and then use an ensemble method in most instances.

Labels