Alteryx Designer Desktop Discussions

joacoachinelli · ‎02-08-2021

Hi!

I was wondering if someone could help me with my modelling:

I´m not beaing able to understand why I get a "poor" value when I change the Data Input I use for the same Decision Tree Model:

When I use my Decision Tree to predict the same Data Inputs with which I built it, it has a relative "Accurate" score. (i.e.: Score: 524,037 over the 514,996 Actual Value)

But when I change my Input Data and use the same Decision Tree Model, in new data I want to Predict, I get an "Inaccurate" score (Score: 421,566 while the Actual value was in fact 270,820)

Why could it be? Is it something bvious about the modelling that I´m missing?

Thanks!

SydneyF · ‎02-09-2021

Hi @joacoachinelli!

Thank you for posting to the Alteryx Community. What you are seeing is an artifact of how Decision Trees (and most machine learning algorithms) work, and is expected behavior.

A decision tree creates rules for splitting data into groups. The algorithm "learns" these rules based on the training data. A decision tree model will perform well on the data it was trained with because it has effectively already "seen" this data, and created rules to sort this particular data set as correctly as possible.

Because of this, evaluating a model using the training data will always return overly optimistic results, and it is a best practice to evaluate your models using data that was not included in the training data. This subset of data is also known as holdout or validation data.

You can read more about this concept here:

https://community.alteryx.com/t5/Data-Science/Holdouts-and-Cross-Validation-Why-the-Data-Used-to-Eva...

It is possible for a model to focus too much on the specific details in your training data, which causes the model to perform poorly on data is has not seen before because it fails to make "generalized" rules - this is known as overfitting. Decision trees in particular are prone to overfitting.

You can read more about his here:

https://community.alteryx.com/t5/Data-Science/Bias-Versus-Variance/ba-p/351862

Alteryx Designer Desktop Discussions

Re: Tool Mastery | Decision Tree