Hello all,
I'm working on a very simple logistic regression workflow but I'm a beginner here and I must say the help isn't that simple.
Data is like this
Order_Id | Country | Country_id | Category | Category_id | Amount | Discount | At_Risk |
1 | Germany | 1 | Bike | 1 | 1100 | 0 | 1 |
2 | France | 2 | Bike | 1 | 2200 | 0 | 0 |
3 | Germany | 1 | Clothes | 2 | 30 | 0 | 0 |
4 | Spain | 3 | Bike | 1 | 1290 | 0,2 | 0 |
5 | France | 2 | Bike | 1 | 1750 | 0 | 0 |
6 | Germany | 1 | Bike | 1 | 2100 | 0,3 | 1 |
7 | Spain | 3 | Accessories | 3 | 125 | 0 | 0 |
8 | Portugal | 4 | Bike | 1 | 1800 | 0 | 0 |
My target variable is At_Risk.
The first thing I have noticed is that you can't use a numeric variable as a target variable, I don't know why and how that makes sense. Only string variables are proposed in the dropdown menu.
But ok.
Then, I have a few categorical variable (like Country or Category) and also something more quantitative (amount). It seems that I don't have to distinguish it? Also, I can use both strings and numbers in that ? The only thing I have to avoid is almost duplicate variable like Country and Country_id.
Am I right?
is there something I miss ? I wonder if I should make categories of amount instead of having the exact amount.
Thanks for your help.
Simon
@simonaubert_bd to respond to your first statement, re-target variable being only a string. Logistic regression is for classification (and how it is set up in most cases and in alteryx can only have 2 outcomes, a yes/no, true/false,0/1) hence why the target variable needs to be a string rather than numerical.
You wouldn't want variables that represent the same information for example country and country id as you suggested.
If you want an exact amount can I suggest numerical model, e.g. Liner regression, Random forrest, DT.