Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Tool Mastery

Explore a diverse compilation of articles that take an in-depth look at Designer tools.
Become a Tool Master

Learn how you can share your expertise with the Community

LEARN MORE

Tool Mastery | Contingency Table

EddieW
Alteryx
Alteryx
Created

This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Contingency Table Tool on our way to mastering the Alteryx Designer:ContingencyTable.png

 

The Contingency Table tool is a part of the Data Investigation category in Alteryx Designer, which comes as a part of the predictive tools installation. Intuitively, you can use the Contingency Table tool to create a contingency table. Contingency tables can be used to summarize categorical variables and understand how variables may be related to one another.

 

 

DataInves.PNG

 

 

Frequency Tables and Contingency Tables

 

Contingency tables are a type of frequency distribution table. A frequency table summarizes the distribution of values for each variable in a data set. It shows us how frequently distinct values occur for each variable in the dataset, and how these distinct values are distributed. You can use the Frequency tool in Alteryx (also a part of the data investigation category) to create a standard frequency table.

 

A contingency table tells us how frequently distinct values occur in for one variable in relation to another variable. A contingency table displays values in a matrix format, where the values of the variables being compared make up the rows and columns of the table. The individual cells of a contingency table contain the number of rows (observations) in your original data set for each value combination of the two variables. 

 

You may be wondering why you would want to use the Contingency Table tool instead of the Frequency Table tool? The most important difference between Frequency Table and Contingency Table is that Frequency Table looks at each field individually and returns the cumulative frequency and percent of that variable value for the data set:

 

 

contin.PNG                                                         *Frequency Table – Looking at the screenshot above, you can see the result is tracking frequency by columns. It’s not combining column A with column B

 

 

 

Where the Contingency Table tool will return a list with all of the possible combinations of variable values for the selected fields, as well as frequency and percent columns for each combination. 

 

 

Capture2.PNG

                                                                              *Contingency Table – Looking at the screen shot above, you can see the result is tracking frequency by the combination of InputField_A and InputField_B

 

 

Another important feature of the Contingency Table is the option to include chi-squared statistic when you are analyzing two variables. A Chi-squared test is a way to quantitatively determine if there is a statistically significant relationship between variables. A very small chi-square test statistic means that there is a statistically significant relationship between the variables. A large chi-square test statistic suggests there is not a relationship between the variables. Feel free to refer to this link if you want to learn more on how to calculate chi-square test statistic. 

 

How to Configure the Contingency Table Tool

 

The first step in configuring the Contingency Table tool is to determine if you would like to include a chi-squared statistic or not:

 

  1. Include chi-squared statistic: Selecting this option limits the number of variables that can be selected to a maximum of two.config1.PNG

     

     

    With include chi-squared statistic selected, the tool will return values for chi-squareddf (degrees of freedom), and p-value in the R output anchor, as seen below:

     

    Capture.PNG

     

  2. Do not include chi-squared statistic: If  you do not need a chi-squared statistic to be calculated, you can choose the second option. With this option, you will need to select between two and four variables to be analyzed.config2.PNG

     

 

As documented in the tool's Help page, the tool will only support variables that meet the following criteria:

 

  1. You can only select each variable once. If you do not select a unique variable for each drop-down option, the tool will throw an error:Capture.PNG
  2. The following field types cannot be selected, and will not appear as options: FixedDecimal, Float, Double, Date, Time, DateTime, Blob, and SpatialObj. This is because a contingency table is intended for categorical variables. Integer field types are allowed but should only be used if the field is truly categorical. Capture1.PNG

     

 

Viewing the Output

 

There are three output anchors for the Contingency Table tool:

 

D anchor -  D stands for "data". When clicking on the D anchor, you will see the following fields:

 

a. Original field name of the input data - The number of fields that show up here will directly depend on the number of variables that you selected in the configuration window of the Contingency Table tool.

 

OutputAnchor.png

 

b. Frequency – The number of times the selected variable combination occurs in the dataset.

c. Percent – Percentage of the combination's frequency in the dataset [(Frequency/Total Records) * 100]

 

R anchor – This anchor contains a report (R) showing a contingency table for each combination of field values with the Total Frequency and Percent for each row and column.

 

report.png

 

 

I anchor -  The I anchor holds an Interactive output that allows the user to customize what chart displays.

interactive.png

 

 

 

 

 

The Contingency Table tool provides us an overview of the data with the frequency of the combination values and percentage of their occurrence; the tool helps us identify abnormalities and relationships in our data, so we can account for them while performing any advanced analytics. The Contingency Table tool can be very useful, and I hope with this article you will be ready to give it a try and put it into use!

 

By now, you should have expert-level proficiency with the Contingency Table tools! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at community@alteryx.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.

  

Stay tuned with our latest posts every Tool Tuesday by following Alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.

Comments
Tim_Lang
6 - Meteoroid

Is there a similar tool to equally spread out a measured contingency? For instance, I'm trying to create a complex sequence to spread out various contingencies to spread characteristics equally based on their frequency.

 

Let's say for the above example we have 40 men and 60 women and we want the spread the women and men in order equally. Also, if we added another variable to each row of data and we wanted to spread this variable by category among the the sequence.

wellis
6 - Meteoroid

Hi, I'm confused by the interpretation of a high Chi-squared statistic: 'A large chi-square test statistic suggests there is not a relationship between the variables.'

 

The null hypothesis in a contingency test of independence is that there is no association b/w the two variables - that is, they are independent. A high Chi-squared statistic (if the p-value is less than your chosen significance level) would indicate that you should reject the null hypothesis - i.e., there is an indication that there is an association or relationship b/w the variables.

 

I also wondered if the Alteryx tool tests and warns re the rule of 5 - that there should be a count of at least 5 for the expected value of any cell in the table. Thanks.

SaiKrishna2589
8 - Asteroid

 

 

Yea I think the interpreation in the article on chi-square test needs to be reviewed.

Generally, what I read is higher chi-square test statistic suggests a higher likelihood of a significant association between the variables.,