Hi! I am working on finding the best predictor variables for a dependent variable. I have been using the Linear Regression tool to find the correlation with the quantiative variables. But Linear Regression only allows numeric/binary values. What tool/s do you recommend for finding correlation with the qualitative variables (i.e. categories) ?
Thanks so much!
Solved! Go to Solution.
Just to make sure I understand: you're setting one of your categorical variables as the target of your Linear Regression, and looking for correlation with other variables that way?
If so, that's really a classification problem rather than a regression problem: you want the system to predict (classify) the category value of the categorical variable. Therefore, any of the classification-type model algorithms should work better... (Boosted Model, Decision Tree, Decision Forest, Naive Bayes, SVM, etc...)
Hope that helps!
Thank you JohnJPS.
I am setting my categorical variable as independent and a quantitative variable as dependent. I think your recommendation works for this setup. Any thoughts on which of the classification-type model is best for relating Sales data (dependent) to customer categories (independent)?
Unfortunately, I wouldn't be the one to hazard a guess as to when one classifier might be better than another. No guarantees, but the Boosted Model is hard to beat in most cases, so would probably be a safe bet. (The implementation in Alteryx is based on R's "gbm" which you could google for more information if desired.)
I believe that help in answering @yuxiliu's question of which classifier is better would come from the relatively new Model Comparison tool.
You can find a sample of this with the macro on the Gallery in the Predictive District.
Awesome! Thanks for the recommendation!