Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Chow-Liu macro

Gottfried
7 - Meteor

Hi everyone,

I designed a macro that computes the Chow-Liu tree of categorical datasets. You can use it on any other dataset as long as you first turn non-categorical fields into categories, notably trough binning (or tiling as it is called in Alteryx) of numerical fields.

 

The point of a Chow-Liu tree is to show the main dependencies between the variables (categorical fields) at hand. You can replace the word “dependency” with “correlation” or “mutual information” or “transinformation” or “cross-entropy”. The dependencies will not de oriented thus do not show the direction of the potential causal effects at work between the variables (but Judea Pearl has proven that one cannot infer full causal structures based only observational data anyhow). The method eliminates the weakest statistical dependencies in order to achieve a maximum-entropy spanning tree, i.e. a tree without loops or in other words in which the path between any two variables is unique. The model obtained is in general an approximation yet a very simple to interpret and in most cases quite relevant one.

 

You may use this tree…

  • to get a quick overview of your dataset and its internal dependencies;
  • before considering predictive analytics so you focus on most relevant influencers on your target variable;
  • as starting point to build a Bayesian network;
  • to build a causal model (e.g. a causal Bayesian network) by turning the non-oriented edges into causal arrows.

 

Have fun with this macro. In case it turns out to be useful for you (!) do not hesitate to give feedback about your implementation and the kind of datasets you applied the macro to.

Gottfried

 

Technical details about the workflow

 

https://en.wikipedia.org/wiki/Chow-Liu_tree

 

The main steps in the Alteryx macro are:

  • Data acquisition and preparation (notably removing unique value fields)
  • Calculate transinformation between all pairs of variables (categorical fields).
  • Run a minimum-weight spanning tree (MST) algorithm using transinformation (actually negative transinformation) as weight.

I used a MST algorithm written in R found on CRAN (Comprehensive R Archive Network) which I adapted in the R tool of Alteryx.

https://rdrr.io/cran/edmcr/src/R/mst.R

 

You may also use the very nice “pure Alteryx” macro designed by @StephaneP  (Stéphane Portier) based on the famous Prim’s algorithm.

Prism s algorythmv3.yxzp

https://en.wikipedia.org/wiki/Prim%27s_algorithm

3 REPLIES 3
KaneG
Alteryx Alumni (Retired)

This is great! Top work and thanks for sharing!

 

In you publish it on the Alteryx Gallery, then you may reach more people as well!

Gottfried
7 - Meteor

Thank you for the tip, @KaneG 

I am glad you liked the macro and would be interested to hear from any of your practical implementation of it.

 

It is the first time I post something in the Gallery, so I hope I fixed it all right.

https://gallery.alteryx.com/#!app/Chow-Liu-macro/5fc4afe28a933716882b21cd

 

Regards,

Gottfried

 

Gottfried
7 - Meteor

Hello Community,

I post an adjusted version of my Chow-Liu macro.

 

I found out the the latest update of Alteryx Designer seems to have modified the behavior of the cross-table tool in the way crossed fields are sorted in the output. This affected the macro which was based on a different behavior. At my end, this seems now fixed and running fine with this new version.

 

For further info about the purpose of the macro, please refer to my previous publication below.

 

Best regards,

Gottfried

Labels