Alteryx Designer

Find answers, ask questions, and share expertise about Alteryx Designer.

Specify split in Decision Tree

5 - Atom

Hi Guys,


Is there anyway in Alteryx that I can specify split in Decision Tree building process?

Currently, all the splits are done by optimizing algorithm. Can I force it to split on say "Age" at my first split?





Hi @loretta



It is possible to specify splits in the Decision Tree building process in Alteryx by essentially using the Decision Tree tools to create a Decision Tree "by hand".


The steps to do this are as follows:



1. Learn a tree with "Age" as your only predictor variable, and The maximum allowed depth of any node in the final tree set to 2 (1 would be better, but 2 is the lowest the tool will allow) and The minimum number of records needed to allow for a split set to the number of records in your dataset (this is another method to ensure the decision tree to only creates one split) (Both of these options are in the HyperParameters drop down, in Model Tab, in the Customize Window).







2. Using the Report Output of the Decision Tree tool, identify the split threshold(s) of the Leafs



3. Use a filter tool, splitting the data based on "Age", matching the split threshold of the Decision Tree Report.




4. Create "subtrees" for your left and right branches with Decision Tree Tools.





This process will allow you to specify splits in the Decision Tree building process. You can repeat these steps downstream for each split if you would like to. 


Another option might be to create bins (e.g., 0-20, 20-40, 40-60 etc.) for your age data, and subset the data for each of the age bins, then train a separate decision tree on each of these segments.


Only you know your data and your use case, but I want  to mention that when building a decision tree with all of your predictor variables, at every iteration the Decision Tree Tool is choosing best variable for splitting (either based on Gini coefficient or Information Index, depending on how the tool is configured). This means that if your data includes a better predictor variable that separates the classes more than that can be done by the predictor Age, then that variable is chosen first by the Decision Tree Tool. 



Does this all make to you? There is a Stack Overflow post that discusses this process in R if you are interested in seeing additional information.


Please let me know what you think, or if you have further questions!




SydneyF - Customer Support Engineer



5 - Atom

This is amazing!!! Thank you so much for such detailed explanation! 

I had the same idea actually but didn't know how to execute it in Alteryx.. The filter step is quite impressing.


Allow me to be a little bit greedier, 

- Is there any way Alteryx can pass result (e.g the split thresholds) to the subtrees so that the user does not need to manually input those parameters. (can it be done using macro?)


You've made Alteryx more powerful now :) I should spend more time to explore...





Hi again @loretta


Off the top of my head, I think there is a way to have Alteryx pass the parameters of a Decision Tree to other tools, but it would require custom R code. The outputs of the Decision Tree Tool are a serialized R model object, a Report, and an Interactive Report. You could potentially extract the information from the R model object, but this would take an R Tool and code that would unserialize the model object and then extract and output the parameters you are interested in as a data frame.


If you had that piece working, you could create a Dynamic Filter Macro with the extracted decision tree parameter as your ¿ input and your data as you standard input.


Does that make sense? I haven't tested this yet, but I think it would be the best strategy.