Hello,
This is my first post so please bear with me if I ask a strange / unclear question.
I'm a bit confused about the outcome from a random forest classification model output. I have a model which tries to predict 5 categories of customers.
The browse tool after the RF tool says the OOB estimate of error is 79.5 %. If I calculate the outcome from the confusion matrix just below (in the browse tool), there are 62% wrongly classified.
And if I use the score tool on the test set I get that 19% are wrongly classified. (training set has less than 1% wrongly classified using score)
In my world they should all be fairly close to each other (minus maybe the score from the training set).
Am I missing something?
The insanely good score from the training set makes me think my model is overfitted. How do I adjust the RF model to reduce that (if that is the problem)?
Thanks