This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
This is more of a statistics question rather than an Alteryx question....
I am currently working on credit line segmentation, and identifying characteristics of high limit borrowers compared to lower limit borrowers. I am wondering if I segment credit lines based on credit limit such as....
Can I consider these to be categorical variable and run a logistic regression against the different segments? If not what would you recommend?
Hi, yes, if your interest is high vs. low, you can absolutely consider it a logistic regression problem; and in general any continuous predictor variable can be split into segments and regarded as factors, if desired. Only downside is you can lose information... e.g. $1 is much less than $100,000, but if $1 goes into bin "A" and $100,000 goes into bin "F" and there is no "much less than" connotation between bins... that's something to consider. Maybe convert "bin" to a integer value.
Yes, it wold be considered a categorical variable. The other option would be to assign an increasing value to each from 1 to 5 and consider it an ordinal variable. If you do use it as a categorical variable, I would still assign it an increasing numerical variable so that when R chooses the base category it compares the other categories to, it will choose the one you want instead of choosing one you don't want.
A different question is how did you arrive at the categories? Is there a hypothesis or business rule that supports these groupings?