This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 01-29-2019 09:44 AM - edited on 03-08-2019 12:13 PM by Community_Admin
This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Contingency Table Tool on our way to mastering the Alteryx Designer:
The Contingency Table tool is a part of the Data Investigation category in Alteryx Designer, which comes as a part of the predictive tools installation. Intuitively, you can use the Contingency Table tool to create a contingency table. Contingency tables can be used to summarize categorical variables and understand how variables may be related to one another.
Frequency Tables and Contingency Tables
Contingency tables are a type of frequency distribution table. A frequency table summarizes the distribution of values for each variable in a data set. It shows us how frequently distinct values occur for each variable in the dataset, and how these distinct values are distributed. You can use the Frequency tool in Alteryx (also a part of the data investigation category) to create a standard frequency table.
A contingency table tells us how frequently distinct values occur in for one variable in relation to another variable. A contingency table displays values in a matrix format, where the values of the variables being compared make up the rows and columns of the table. The individual cells of a contingency table contain the number of rows (observations) in your original data set for each value combination of the two variables.
You may be wondering why you would want to use the Contingency Table tool instead of the Frequency Table tool? The most important difference between Frequency Table and Contingency Table is that Frequency Table looks at each field individually and returns the cumulative frequency and percent of that variable value for the data set:
*Frequency Table – Looking at the screenshot above, you can see the result is tracking frequency by columns. It’s not combining column A with column B
Where the Contingency Table tool will return a list with all of the possible combinations of variable values for the selected fields, as well as frequency and percent columns for each combination.
*Contingency Table – Looking at the screen shot above, you can see the result is tracking frequency by the combination of InputField_A and InputField_B
Another important feature of the Contingency Table is the option to include chi-squared statistic when you are analyzing two variables. A Chi-squared test is a way to quantitatively determine if there is a statistically significant relationship between variables. A very small chi-square test statistic means that there is a statistically significant relationship between the variables. A large chi-square test statistic suggests there is not a relationship between the variables. Feel free to refer to this link if you want to learn more on how to calculate chi-square test statistic.
How to Configure the Contingency Table Tool
The first step in configuring the Contingency Table tool is to determine if you would like to include a chi-squared statistic or not:
With include chi-squared statistic selected, the tool will return values for chi-squared, df (degrees of freedom), and p-value in the R output anchor, as seen below:
As documented in the tool's Help page, the tool will only support variables that meet the following criteria:
Viewing the Output
There are three output anchors for the Contingency Table tool:
D anchor - D stands for "data". When clicking on the D anchor, you will see the following fields:
a. Original field name of the input data - The number of fields that show up here will directly depend on the number of variables that you selected in the configuration window of the Contingency Table tool.
b. Frequency – The number of times the selected variable combination occurs in the dataset.
c. Percent – Percentage of the combination's frequency in the dataset [(Frequency/Total Records) * 100]
R anchor – This anchor contains a report (R) showing a contingency table for each combination of field values with the Total Frequency and Percent for each row and column.
I anchor - The I anchor holds an Interactive output that allows the user to customize what chart displays.
The Contingency Table tool provides us an overview of the data with the frequency of the combination values and percentage of their occurrence; the tool helps us identify abnormalities and relationships in our data, so we can account for them while performing any advanced analytics. The Contingency Table tool can be very useful, and I hope with this article you will be ready to give it a try and put it into use!
By now, you should have expert-level proficiency with the Contingency Table tools! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at community@alteryx.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.
Stay tuned with our latest posts every Tool Tuesday by following Alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.
Is there a similar tool to equally spread out a measured contingency? For instance, I'm trying to create a complex sequence to spread out various contingencies to spread characteristics equally based on their frequency.
Let's say for the above example we have 40 men and 60 women and we want the spread the women and men in order equally. Also, if we added another variable to each row of data and we wanted to spread this variable by category among the the sequence.
Hi, I'm confused by the interpretation of a high Chi-squared statistic: 'A large chi-square test statistic suggests there is not a relationship between the variables.'
The null hypothesis in a contingency test of independence is that there is no association b/w the two variables - that is, they are independent. A high Chi-squared statistic (if the p-value is less than your chosen significance level) would indicate that you should reject the null hypothesis - i.e., there is an indication that there is an association or relationship b/w the variables.
I also wondered if the Alteryx tool tests and warns re the rule of 5 - that there should be a count of at least 5 for the expected value of any cell in the table. Thanks.