I know this sounds unlikely, so I'm probably making a mistake somewhere here, but I've been struggling with the decision tree reporting proportions and number of records in each node that does not seem to fit the actual data being used. I therefore created the very simple decision tree attached here to show what I mean (albeit with very small differences in this case). The workflow is annotated to show where the decision tree reports different results than my manual analysis. Any thoughts welcome.
Solved! Go to Solution.
Hi Kai,
This does indeed look odd. It looks like the decision tree is rounding the amounts on the view. So, where the [FARE] >= 23 split is, it's actually at 23.3 or somewhere around there. This then carries to [FARE] <= 7.9 is actually at about 7.87 and then finally [FARE] < 15 actually something like 15.3. If you put a filter tool in to remove the passengers affected by these differences as such:
Filter at top:
[PassengerId] NOT IN
("331","101",
"853","141")
then everything is the same down to the bottom nodes. That being said, it is not really a good representation. Can you please email your module and this information to support@alteryx.com for them to direct it to one of the predictive tools team members and the can work out if it is something in the R display or something that Alteryx can affect?
Great question! I love seeing the titanic data set from kaggle being used in Alteryx.
The plotting behavior for the tree plot defaults to a precision of 2. However, if you look at the Leaf Summary (the textual depiction of the tree on the report output) you can see the actual split values with greater precision:
Additionally, see the attached macro where I have modified the decision tree to add an input argument for precision on the tree plot:
Leading to following tree:
The R code to make this change alters the call to rpart.plot to include the digits argument. This argument is passed into the R tool as a new piece of meta-data created from a numeric up down interface tool. There are a ton of plotting options for the rpart.plot command, see:
https://cran.r-project.org/web/packages/rpart.plot/rpart.plot.pdf.
Happy Modeling!
Great answer, Sean. Thank you!
Kai :-)