Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Error in Decision Tree?

KaiLarsen
9 - Comet

I know this sounds unlikely, so I'm probably making a mistake somewhere here, but I've been struggling with the decision tree reporting proportions and number of records in each node that does not seem to fit the actual data being used.  I therefore created the very simple decision tree attached here to show what I mean (albeit with very small differences in this case).  The workflow is annotated to show where the decision tree reports different results than my manual analysis.  Any thoughts welcome.

4 REPLIES 4
KaiLarsen
9 - Comet

And the data. The community system wouldn't accept my posting initially.

KaneG
Alteryx Alumni (Retired)

Hi Kai,

 

This does indeed look odd. It looks like the decision tree is rounding the amounts on the view. So, where the [FARE] >= 23 split is, it's actually at 23.3 or somewhere around there. This then carries to [FARE] <= 7.9 is actually at about 7.87 and then finally [FARE] < 15 actually something like 15.3. If you put a filter tool in to remove the passengers affected by these differences as such:

Filter at top:

[PassengerId] NOT IN
("331","101",
"853","141")

 

then everything is the same down to the bottom nodes. That being said, it is not really a good representation. Can you please email your module and this information to support@alteryx.com for them to direct it to one of the predictive tools team members and the can work out if it is something in the R display or something that Alteryx can affect?

 

 

SeanL
Alteryx Alumni (Retired)

Great question! I love seeing the titanic data set from kaggle being used in Alteryx.

 

The plotting behavior for the tree plot defaults to a precision of 2. However, if you look at the Leaf Summary (the textual depiction of the tree on the report output) you can see the actual split values with greater precision: 

 

leaf_summary.png

 

Additionally, see the attached macro where I have  modified the decision tree to add an input argument for precision on the tree plot:

inputs.png

 

Leading to following tree:

 

tree_plot_added_precision.png

 

The R code to make this change alters the call to rpart.plot to include the digits argument. This argument is passed into the R tool as a new piece of meta-data created from a numeric up down interface tool. There are a ton of plotting options for the rpart.plot command, see: 

https://cran.r-project.org/web/packages/rpart.plot/rpart.plot.pdf

 

Happy Modeling!

Thanks,

Sean Lopp
Client Services Representative
KaiLarsen
9 - Comet

Great answer, Sean. Thank you!

 

Kai :-)

Labels