Hi,
I am experimenting with the Decision Tree model and have some difficulties understanding how to test the accuracy of a model on new data. The interactive report produced by the Decision Tree tool is really informative, but I as far as I understand the performance scores (accuracy, F1, precision, recall) are not evaluated on a test set. Does that mean that once the model is built, they are calculated using the entire data it was trained on? The report showed accuracy of 72% over 5 classes.
Here is how I tested the accuracy with a Score tool: I split the data in train and test sets and trained the Decision Tree model on the train set only. Then I used the Score tool with the saved model as yxdb and the test data as inputs. If the score column with the highest probability was the one corresponding to the correct label, I marked the record as correctly predicted. This way I only got 20% accuracy for 5 classes, which is way lower. It makes sense to me that the interactive report is generated on new data, so that it reflects the prediction capability of the model, so I am confused why the accuracy scores differ so much.
My questions are following:
How can I get the interactive report while running a saved model on new data?
How is the accuracy calculated in the Decision Tree tool?
Is decision tree using C5.0 supported by the Score tool? I got an error message is not one of the allowed types.
I will really appreciate your comments on whether I am approaching this correctly and your help with the questions.
Thanks,
Sophie