Hi guys,
I'm trying to learn how to perform correctly a logistic regression.
Let's say I have a datasource1 with old data with 5 fields.
Field A is my target variable (yes/no)
Field B and C are qualitative
Field D and E are numeric, quantitative
And I also have a datasource2 with fresh data I want to use to score my prediction.
What's the best approach?
1) Should I only use the Regression Tool using all my fields as predictor variable and see which fields are significant? (Number of *** in the regression report?).
2) Or should I perform an association analysis, decide what fields are significant (table of contigency for qualitative fields and pearson/spearman correlation for the quantitative ones) and then only input the significant fields in the regression tool as predictive variables?
Solved! Go to Solution.
or should you incorporate stepwise into the process?
If you're learning, I'd suggest reading through "I'm not afraid of Logistic Regression: A friendly introduction for students and people like them". It's about as easy an introductin to Logistic Regression you'll find. The only negative is that the examples are in SPSS, but the concepts are the same as in R. You can get it on Amazon for $4.99.
After reading it, you'll understand the different approaches (stepwise forward vs. stepwise backwards), and how to judge your model.
A good way to start out, especially into regressions for the first time, is to creat MULTIPLE models. You can play with the different variable combinations that you create, and then score each of them to see what ends up with the best fit.
In your case:
Step 1--Create 3 to 5 logistic regressions with different variable combinations, and check the scoring of each to find the best performance. This can get your brain going on the types of variables that predict your target the strongest.
Step 2--Begin creating your own variables out of the ones you already have. For example, if you have a datetime variable, you can begin splitting that into things like Seasons, Weekends/Weekday, Daytime/Nighttime etc. Begin brainstorming all the different ways you can fiddle with the data you already have to make it more useful.
Step 3--Start researching more ways to improve your model. How will you avoid multi-collinearity with your predictor variables? Are there ways in which you can grab more data?
The moral of the story is jump right in, and get your brain working on interesting things!
Hi @Federica_FF,
The Content Engineering team has actually just wrapped up a series of new interactive starter kits, one of which focuses on doing Logistic Regression in Alteryx! You can expect to see the kits on the Gallery by the end of this month. I can post another reply here with the link when they're available if you'd like.
Best,
Bridget
Hi
@BridgetT wrote:Hi @Federica_FF,
The Content Engineering team has actually just wrapped up a series of new interactive starter kits, one of which focuses on doing Logistic Regression in Alteryx! You can expect to see the kits on the Gallery by the end of this month. I can post another reply here with the link when they're available if you'd like.
Best,
Bridget
Thank you guys for the advices!
You're welcome! It should be available soon; I hope it helps!
Hi @Federica_FF,
The logistic regression (and linear regression and A/B testing) starter kits are now available here on the Gallery: https://gallery.alteryx.com/#!app/Predictive-Analytics-Starter-Kit-Volume-1/576326b13df7da0eb48098d5...
Best,
Bridget
page not found! sounds valuable though!