Hi,
I have encountered a problem when using the score tool in combination with the boosted model.
After running the model through the score tool, I use the multi-field binning tool to get 10 score groups. However, when linking the target group to the binned scoring groups I can see that I get a higher concentration of the target in groups associated with a lower probability and vice versa.
When doing the exact same procedure using the logistic regression I get the expected result with higher concentration in the top scoring groups.
Below are tables for boosted and logistic, showing the tiles, max and min probability for each tile and the counts for my target "Kund" in the evaluation sample. I use the exact same scoring procedure.
For me it seems like the probability of target is actually the probability of not target when using the boosted model. Does anyone have an idea why I get this result?
Boosted | ||||
X_Kund_Tile_Num | Min_X_Kund | Max_X_Kund | Count | |
10 | Kund | 0.616654012593737 | 0.703683675012687 | 229 |
9 | Kund | 0.599899411847223 | 0.616538099066424 | 274 |
8 | Kund | 0.585170547029621 | 0.599889418992929 | 305 |
7 | Kund | 0.569856749292572 | 0.584921578458874 | 330 |
6 | Kund | 0.552700562620164 | 0.569847550773193 | 329 |
5 | Kund | 0.53210919371817 | 0.55252102221165 | 361 |
4 | Kund | 0.49907492182379 | 0.532013523926153 | 398 |
3 | Kund | 0.435100470007177 | 0.498906496353244 | 498 |
2 | Kund | 0.36553792540348 | 0.434932812596185 | 638 |
1 | Kund | 0.176942067657917 | 0.365379109142474 | 998 |
Logistic | ||||
X_Kund_Tile_Num | Min_X_Kund | Max_X_Kund | Count | |
10 | Kund | 0.610898835927319 | 0.729458226984427 | 908 |
9 | Kund | 0.572179735082082 | 0.610562695487317 | 655 |
8 | Kund | 0.538361700849224 | 0.572172160535576 | 546 |
7 | Kund | 0.506557233405156 | 0.538354010636627 | 402 |
6 | Kund | 0.476900437163732 | 0.505566504353257 | 316 |
5 | Kund | 0.449070890348496 | 0.476780988643294 | 289 |
4 | Kund | 0.42292232290737 | 0.448520984912177 | 277 |
3 | Kund | 0.395885991584448 | 0.421955291607139 | 321 |
2 | Kund | 0.366905445987612 | 0.395807870615525 | 319 |
1 | Kund | 0.30240817634536 | 0.366870729205422 | 327 |
Solved! Go to Solution.
Hi @Fernström - this is due to changes in the underlying R package gbm() in the Boosted Model tool, where it now provides the probability for the first level of the target (when you choose the Bernoulli loss function for binary targets). We are aware of this issue, and are working to address it.
In the meantime, in the case of binomial outcome, there is actually no need to specify the specify the Bernoulli loss function since what it does by default results in the same loss function when there are only two outcomes.
I see. Thanks a lot for the explanation!