Hi everyone,
I designed a macro that computes the Chow-Liu tree of categorical datasets. You can use it on any other dataset as long as you first turn non-categorical fields into categories, notably trough binning (or tiling as it is called in Alteryx) of numerical fields.
The point of a Chow-Liu tree is to show the main dependencies between the variables (categorical fields) at hand. You can replace the word “dependency” with “correlation” or “mutual information” or “transinformation” or “cross-entropy”. The dependencies will not de oriented thus do not show the direction of the potential causal effects at work between the variables (but Judea Pearl has proven that one cannot infer full causal structures based only observational data anyhow). The method eliminates the weakest statistical dependencies in order to achieve a maximum-entropy spanning tree, i.e. a tree without loops or in other words in which the path between any two variables is unique. The model obtained is in general an approximation yet a very simple to interpret and in most cases quite relevant one.
You may use this tree…
Have fun with this macro. In case it turns out to be useful for you (!) do not hesitate to give feedback about your implementation and the kind of datasets you applied the macro to.
Gottfried
Technical details about the workflow
https://en.wikipedia.org/wiki/Chow-Liu_tree
The main steps in the Alteryx macro are:
I used a MST algorithm written in R found on CRAN (Comprehensive R Archive Network) which I adapted in the R tool of Alteryx.
https://rdrr.io/cran/edmcr/src/R/mst.R
You may also use the very nice “pure Alteryx” macro designed by @StephaneP (Stéphane Portier) based on the famous Prim’s algorithm.
https://en.wikipedia.org/wiki/Prim%27s_algorithm
Solved! Go to Solution.
This is great! Top work and thanks for sharing!
In you publish it on the Alteryx Gallery, then you may reach more people as well!
Thank you for the tip, @KaneG
I am glad you liked the macro and would be interested to hear from any of your practical implementation of it.
It is the first time I post something in the Gallery, so I hope I fixed it all right.
https://gallery.alteryx.com/#!app/Chow-Liu-macro/5fc4afe28a933716882b21cd
Regards,
Gottfried
Hello Community,
I post an adjusted version of my Chow-Liu macro.
I found out the the latest update of Alteryx Designer seems to have modified the behavior of the cross-table tool in the way crossed fields are sorted in the output. This affected the macro which was based on a different behavior. At my end, this seems now fixed and running fine with this new version.
For further info about the purpose of the macro, please refer to my previous publication below.
Best regards,
Gottfried