This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Based on the same dataset I am training a random forest and a decision tree.
As far as I am concerned, the split order demonstrates how important that variable is for information gain, first split variable being the most important one.
A similar report is given by the random forest output via its variable importance plot. The order of variable importance does not overlap with that of decision tree (through manual inspection) and one variable has a big deviation in the ranking (in decision tree it is the first one to be split, in variable importance plot of random forest it is one of the least important).
What would you do in this case to come up with a solid result?
@atamertarslan it sounds like you have a good grasp of interpreting variable importance from the decision tree model. As far as the random forest, the tool uses the randomForest R package and you can find documentation about the importance measure here. I suppose you could favor the random forest measure since it is averaged over many trees. You could also look at additional importance measures, such as that in the Boosted tool and the Importance Weights tool to see if they agree with one or the other.