community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Knowledge Base

Definitive answers from Designer experts.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE

Tool Mastery | Association Analysis

Alteryx
Alteryx
Created on

Association_Analysis.pngThis article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Association Analysis Tool on our way to mastering the Alteryx Designer:

  

The Association Analysis Tool allows you to choose any numerical fields and assesses the level of correlation between those fields. You can either use the Pearson product-moment correlation, Spearmen rank-order correlation, or Hoeffding's D statistics to perform your analysis. You can also have the option of doing an in-depth analysis of your target variable in relation to the other numerical fields. After you’ve run through the tool, you will have two outputs:

  

 Association_AnalysisR.png

 

The R output will give you two or three tables depending on if you’ve selected “Target a field for more detailed analysis” in the tool’s configuration. If this checkbox is checked, you will get a table that lists the coefficients as well as their respective p-values for all the fields that are being compared with the target variable like so:

 

 table1.jpg

 

If you are unfamiliar with what a correlation coefficient is or what p-values are or you simply want to know more about them, I suggest you take look at this resource.

 

In the second table, you have a matrix of correlation values of all the fields compared with one another.

 

 table2.jpg

 

And lastly, you get the matrix of p-values for those coefficients:

 

 table3.jpg

 

  Association_AnalysisI.png

 

The I output is basically the same as the O output but with a little more flair. It provides you with a correlation matrix in the form of an interactive heat map. When you select a pixel, a scatterplot of the two variables will be displayed next to it.

 

scatterplot_cubic_relationship2.jpg

 

 

In general, the association analysis is a great tool to help understand the relationships in your data (i.e. how your variables correlate) and which variables to choose for predictive models such as regression. In the tool, we have three different methods of correlation. We often get a lot of questions over which to use and what is the difference between the three, thus I’ll go over them briefly.

 

Pearson product moment correlation 

The Pearson method measures the strength of linear dependence between two variables. This means you will see a higher correlation among variables that increase or decrease concurrently at the same rate.

 

Example:

plinear.png

Strong positive linear correlation

 

 nlinear.png

Strong negative correlation

 

Spearman rank-order correlation

 The Spearman method is a nonparametric version of the Pearson method. It looks at the strength of any monotonic relationship. A monotonic relationship is any relationship where both variables increase or decrease concurrently but not necessarily at the same rate. This includes relationships that are not only linear, but can also be exponential, logarithmic etc.… Another way to think of monotonic relationships is, the rate of change will only stay in one direction, increasing or decreasing. It will never be both.

 

 monotonic.jpg

 

The two graphs on the left never change direction while the two graphs on the right do change direction and are considered non-monotonic.

 

At times you may get a good coefficient for a Pearson correlation between two values but an even better one for a Spearman. If this is the case, then it is possible that the relationship between the two

variables is not truly linear. Therefore, we highly suggest that you consult the scatterplot of the data.

 

Example:

Displ and MPG in this case have a strong Pearson coefficient of -0.85. But looking at the scatterplot, the relationship can be better described by an exponential curve. So after doing a Spearman correlation, we get an even better correlation of -0.91.

 

 scatterplot_cubic_relationship2.jpg

 

Hoeffding's D statistics 

Hoeffdings D statistic is another non-parametric test that is useful for identifying non-monotonic relationships like the ones discussed above.

 

Now that you know what all three look for, you may be asking “Well, which method should I pick?”

 

The easy answer, especially if you are not truly sure about your data, is all of them. Knowing how each variable correlates will give you a better understanding of what models you want to use and what variable you should or shouldn’t choose for those models. Any time you are doing any predictive modeling you should always be using Data Investigation Tools such as this one before constructing any predictive model.

 

Things to look out for 

Since the Association is an R-based macro usually any errors that come from this tool is almost always a data issue.

 

Example: If you only feed in 4 or less records into the association analysis tool (you shouldn’t be doing this anyway since it's bad practice) you will get this error:

 

Error: Association Analysis (39): Tool #9: Error in rcorr(the.data, type = cor.type) : must have >4 observations

 

If you are getting errors from this tool or any other predictive tool, please see: Troubleshooting the Predictive Tools.

 

The tables and scatterplots in this article are from the association analysis sample workflow in Alteryx. You can find it if you go to the Help tab->Sample Workflows->Predictive Analytics-> Association Analysis.

 

If you enjoyed this topic and want to learn more about predictive capabilities within Alteryx, check out the One Stop Shop for Predictive Resources.

  

And remember… correlation does not imply causation

 

correlation.png 

 

By now, you should have expert-level proficiency with the Association Analysis Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at Community@alteryx.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.

 

Stay tuned with our latest posts every Tool Tuesday by following Alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.