Hi, newbie here.
I am trying to solve this problem with Alteryx: we have two variables, a continuous predictor variable and a categorical target variable with only two values. We are searching for a simple rule or set of rules to break the continuous predictor, finding the best places to maximize the discriminatory power.
Example:
Variable\Register | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Target A | Type1 | Type1 | Type2 | Type2 | Type1 | Type2 | Type1 | Type1 | Type1 |
Predictor B | 12 | 15 | 21 | 26 | 27 | 30 | 67 | 78 | 98 |
We can use the Decision Tree tool and decompose the tree into rule-based models through the C5.0 algorithm, getting this solution
- If B <= 15 then Type1
- If B > 15 and B <= 30 then Type2
- If B > 30 then Type1
However, with a real case, I tend to get an Error: Decision Tree (12): Decision Tree: Error in apply(prob, 1, max) : dim(X) must have a positive length. Other times, I get no errors but no rules either: all entries get the same classification.
I had a look at it may be that the variable does not provide enough information to grow the tree. The rpart package caps the depth that the tree grows by setting default limits.
How can I get around this without dealing with the R code inside Decision Tree?
Also… any idea about how to solve this problem, with Alteryx, without using the Decision Tree tool?
Thanks,
Javi