Score tool
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I have encountered a problem when using the score tool in combination with the boosted model.
After running the model through the score tool, I use the multi-field binning tool to get 10 score groups. However, when linking the target group to the binned scoring groups I can see that I get a higher concentration of the target in groups associated with a lower probability and vice versa.
When doing the exact same procedure using the logistic regression I get the expected result with higher concentration in the top scoring groups.
Below are tables for boosted and logistic, showing the tiles, max and min probability for each tile and the counts for my target "Kund" in the evaluation sample. I use the exact same scoring procedure.
For me it seems like the probability of target is actually the probability of not target when using the boosted model. Does anyone have an idea why I get this result?
Boosted | ||||
X_Kund_Tile_Num | Min_X_Kund | Max_X_Kund | Count | |
10 | Kund | 0.616654012593737 | 0.703683675012687 | 229 |
9 | Kund | 0.599899411847223 | 0.616538099066424 | 274 |
8 | Kund | 0.585170547029621 | 0.599889418992929 | 305 |
7 | Kund | 0.569856749292572 | 0.584921578458874 | 330 |
6 | Kund | 0.552700562620164 | 0.569847550773193 | 329 |
5 | Kund | 0.53210919371817 | 0.55252102221165 | 361 |
4 | Kund | 0.49907492182379 | 0.532013523926153 | 398 |
3 | Kund | 0.435100470007177 | 0.498906496353244 | 498 |
2 | Kund | 0.36553792540348 | 0.434932812596185 | 638 |
1 | Kund | 0.176942067657917 | 0.365379109142474 | 998 |
Logistic | ||||
X_Kund_Tile_Num | Min_X_Kund | Max_X_Kund | Count | |
10 | Kund | 0.610898835927319 | 0.729458226984427 | 908 |
9 | Kund | 0.572179735082082 | 0.610562695487317 | 655 |
8 | Kund | 0.538361700849224 | 0.572172160535576 | 546 |
7 | Kund | 0.506557233405156 | 0.538354010636627 | 402 |
6 | Kund | 0.476900437163732 | 0.505566504353257 | 316 |
5 | Kund | 0.449070890348496 | 0.476780988643294 | 289 |
4 | Kund | 0.42292232290737 | 0.448520984912177 | 277 |
3 | Kund | 0.395885991584448 | 0.421955291607139 | 321 |
2 | Kund | 0.366905445987612 | 0.395807870615525 | 319 |
1 | Kund | 0.30240817634536 | 0.366870729205422 | 327 |
Solved! Go to Solution.
- Labels:
- Predictive Analysis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @Fernström - this is due to changes in the underlying R package gbm() in the Boosted Model tool, where it now provides the probability for the first level of the target (when you choose the Bernoulli loss function for binary targets). We are aware of this issue, and are working to address it.
In the meantime, in the case of binomial outcome, there is actually no need to specify the specify the Bernoulli loss function since what it does by default results in the same loss function when there are only two outcomes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I see. Thanks a lot for the explanation!
