Looking for Starter Kits? Head to the Community Gallery! Now formatted as YXIs for easy installation.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Re: Tool Mastery | Decision Tree

6 - Meteoroid


I was wondering if someone could help me with my modelling:

I´m not beaing able to understand why I get a "poor" value when I change the Data Input I use for the same Decision Tree Model:


When I use my Decision Tree to predict the same Data Inputs with which I built it, it has a relative "Accurate" score. (i.e.: Score: 524,037 over the 514,996 Actual Value)





But when I change my Input Data and use the same Decision Tree Model, in new data I want to Predict, I get an "Inaccurate" score (Score: 421,566 while the Actual value was in fact 270,820)





Why could it be? Is it something bvious about the modelling that I´m missing?  



Alteryx Alumni (Retired)

Hi @joacoachinelli!


Thank you for posting to the Alteryx Community. What you are seeing is an artifact of how Decision Trees (and most machine learning algorithms) work, and is expected behavior. 


A decision tree creates rules for splitting data into groups. The algorithm "learns" these rules based on the training data. A decision tree model will perform well on the data it was trained with because it has effectively already "seen" this data, and created rules to sort this particular data set as correctly as possible.


Because of this, evaluating a model using the training data will always return overly optimistic results, and it is a best practice to evaluate your models using data that was not included in the training data. This subset of data is also known as holdout or validation data.


You can read more about this concept here:




It is possible for a model to focus too much on the specific details in your training data, which causes the model to perform poorly on data is has not seen before because it fails to make "generalized" rules - this is known as overfitting. Decision trees in particular are prone to overfitting


You can read more about his here: