This is more of a statistics question rather than an Alteryx question....
I am currently working on credit line segmentation, and identifying characteristics of high limit borrowers compared to lower limit borrowers. I am wondering if I segment credit lines based on credit limit such as....
$5,000-$8,000
$8,000-10,000
$10,000-$15,000
$15,000-$20,000
$20,000-$30,000
Can I consider these to be categorical variable and run a logistic regression against the different segments? If not what would you recommend?
Solved! Go to Solution.
Hi, yes, if your interest is high vs. low, you can absolutely consider it a logistic regression problem; and in general any continuous predictor variable can be split into segments and regarded as factors, if desired. Only downside is you can lose information... e.g. $1 is much less than $100,000, but if $1 goes into bin "A" and $100,000 goes into bin "F" and there is no "much less than" connotation between bins... that's something to consider. Maybe convert "bin" to a integer value.
Hi @IJH34
Yes, it wold be considered a categorical variable. The other option would be to assign an increasing value to each from 1 to 5 and consider it an ordinal variable. If you do use it as a categorical variable, I would still assign it an increasing numerical variable so that when R chooses the base category it compares the other categories to, it will choose the one you want instead of choosing one you don't want.
A different question is how did you arrive at the categories? Is there a hypothesis or business rule that supports these groupings?
Philip,
Thank you for your solution. Makes complete sense!
The segments present in the questions are just hypothetical, however, I've been able to appropriately bin segments for actual model purposes using statistical reasoning.