Hi everyone, I designed a macro that computes the Chow-Liu tree of categorical datasets. You can use it on any other dataset as long as you first turn non-categorical fields into categories, notably trough binning (or tiling as it is called in Alteryx) of numerical fields. The point of a Chow-Liu tree is to show the main dependencies between the variables (categorical fields) at hand. You can replace the word “dependency” with “correlation” or “mutual information” or “transinformation” or “cross-entropy”. The dependencies will not de oriented thus do not show the direction of the potential causal effects at work between the variables (but Judea Pearl has proven that one cannot infer full causal structures based only observational data anyhow). The method eliminates the weakest statistical dependencies in order to achieve a maximum-entropy spanning tree, i.e. a tree without loops or in other words in which the path between any two variables is unique. The model obtained is in general an approximation yet a very simple to interpret and in most cases quite relevant one. You may use this tree… * to get a quick overview of your dataset and its internal dependencies; * before considering predictive analytics so you focus on most relevant influencers on your target variable; * as starting point to build a Bayesian network; * to build a causal model (e.g. a causal Bayesian network) by turning the non-oriented edges into causal arrows. Have fun with this macro. In case it turns out to be useful for you (!) do not hesitate to give feedback about your implementation and the kind of datasets you applied the macro to. Gottfried Technical details about the workflow https://en.wikipedia.org/wiki/Chow-Liu_tree The main steps in the Alteryx macro are: * Data acquisition and preparation (notably removing unique value fields) * Calculate transinformation between all pairs of variables (categorical fields). * Run a minimum-weight spanning tree (MST) algorithm using transinformation (actually negative transinformation) as weight. I used a MST algorithm written in R found on CRAN (Comprehensive R Archive Network) which I adapted in the R tool of Alteryx. https://rdrr.io/cran/edmcr/src/R/mst.R You may also use the very nice “pure Alteryx” macro designed by @StephaneP (Stéphane Portier) based on the famous Prim’s algorithm. Prism s algorythmv3.yxzp https://en.wikipedia.org/wiki/Prim%27s_algorithm Chow-Liu package.yxzp

Chow-Liu macro

Hi everyone,

I designed a macro that computes the Chow-Liu tree of categorical datasets. You can use it on any other dataset as long as you first turn non-categorical fields into categories, notably trough binning (or tiling as it is called in Alteryx) of numerical fields.

The point of a Chow-Liu tree is to show the main dependencies between the variables (categorical fields) at hand. You can replace the word “dependency” with “correlation” or “mutual information” or “transinformation” or “cross-entropy”. The dependencies will not de oriented thus do not show the direction of the potential causal effects at work between the variables (but Judea Pearl has proven that one cannot infer full causal structures based only observational data anyhow). The method eliminates the weakest statistical dependencies in order to achieve a maximum-entropy spanning tree, i.e. a tree without loops or in other words in which the path between any two variables is unique. The model obtained is in general an approximation yet a very simple to interpret and in most cases quite relevant one.

You may use this tree…

to get a quick overview of your dataset and its internal dependencies;
before considering predictive analytics so you focus on most relevant influencers on your target variable;
as starting point to build a Bayesian network;
to build a causal model (e.g. a causal Bayesian network) by turning the non-oriented edges into causal arrows.

Have fun with this macro. In case it turns out to be useful for you (!) do not hesitate to give feedback about your implementation and the kind of datasets you applied the macro to.

Gottfried

Technical details about the workflow

https://en.wikipedia.org/wiki/Chow-Liu_tree

The main steps in the Alteryx macro are:

Data acquisition and preparation (notably removing unique value fields)
Calculate transinformation between all pairs of variables (categorical fields).
Run a minimum-weight spanning tree (MST) algorithm using transinformation (actually negative transinformation) as weight.

I used a MST algorithm written in R found on CRAN (Comprehensive R Archive Network) which I adapted in the R tool of Alteryx.

https://rdrr.io/cran/edmcr/src/R/mst.R

You may also use the very nice “pure Alteryx” macro designed by @StephaneP (Stéphane Portier) based on the famous Prim’s algorithm.

Prism s algorythmv3.yxzp

https://en.wikipedia.org/wiki/Prim%27s_algorithm

Chow-Liu package.yxzp

Macros

Machine Learning

R Tool

Accepted answers

Gottfried

Hello Community,

I post an adjusted version of my Chow-Liu macro.

I found out the the latest update of Alteryx Designer seems to have modified the behavior of the cross-table tool in the way crossed fields are sorted in the output. This affected the macro which was based on a different behavior. At my end, this seems now fixed and running fine with this new version.

For further info about the purpose of the macro, please refer to my previous publication below.

Best regards,

Gottfried

Chow Liu macro v2022.05.11.yxzp

All comments

KaneG

This is great! Top work and thanks for sharing!

In you publish it on the Alteryx Gallery, then you may reach more people as well!

Gottfried

Thank you for the tip, @KaneG

I am glad you liked the macro and would be interested to hear from any of your practical implementation of it.

It is the first time I post something in the Gallery, so I hope I fixed it all right.

https://gallery.alteryx.com/#!app/Chow-Liu-macro/5fc4afe28a933716882b21cd

Regards,

Gottfried

Hello Community,

I post an adjusted version of my Chow-Liu macro.

For further info about the purpose of the macro, please refer to my previous publication below.

Best regards,

Gottfried

Chow Liu macro v2022.05.11.yxzp

Quick Links

This months top contributors

atcodedog05 19598

Qiu 15867

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703