Variable Importance of Random Forest versus Decision Tree Splits

Question

Dear all,

Based on the same dataset I am training a random forest and a decision tree.

As far as I am concerned, the split order demonstrates how important that variable is for information gain, first split variable being the most important one.

A similar report is given by the random forest output via its variable importance plot. The order of variable importance does not overlap with that of decision tree (through manual inspection) and one variable has a big deviation in the ranking (in decision tree it is the first one to be split, in variable importance plot of random forest it is one of the least important).

What would you do in this case to come up with a solid result?

I appreciate your thoughts.

Best,

Atamert

NeilR · Accepted Answer

@atamertarslan it sounds like you have a good grasp of interpreting variable importance from the decision tree model. As far as the random forest, the tool uses the randomForest R package and you can find documentation about the importance measure here. I suppose you could favor the random forest measure since it is averaged over many trees. You could also look at additional importance measures, such as that in the Boosted tool and the Importance Weights tool to see if they agree with one or the other.