Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Tool Mastery

Explore a diverse compilation of articles that take an in-depth look at Designer tools.
Become a Tool Master

Learn how you can share your expertise with the Community

LEARN MORE

Tool Mastery | Association Analysis

Ozzie
Alteryx
Alteryx
Created

Association_Analysis.pngThis article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Association Analysis Toolon our way to mastering the Alteryx Designer:

The Association Analysis Tool allows you to choose any numerical fields and assesses the level of correlation between those fields. You can either use the Pearson product-moment correlation, Spearmen rank-order correlation, or Hoeffding's D statistics to perform your analysis. You can also have the option of doing an in-depth analysis of your target variable in relation to the other numerical fields. After you’ve run through the tool, you will have two outputs:

Association_AnalysisR.png

The R output will give you two or three tables depending on if you’ve selected “Target a field for more detailed analysis” in the tool’s configuration. If this checkbox is checked, you will get a table that lists the coefficients as well as their respective p-values for all the fields that are being compared with the target variable like so:

table1.jpg

If you are unfamiliar with what a correlation coefficient is or what p-values are or you simply want to know more about them, I suggest you take look at this resource.

In the second table, you have a matrix of correlation values of all the fields compared with one another.

table2.jpg

And lastly, you get the matrix of p-values for those coefficients:

table3.jpg

Association_AnalysisI.png

The I output is basically the same as the O output but with a little more flair. It provides you with a correlation matrix in the form of an interactive heat map. When you select a pixel, a scatterplot of the two variables will be displayed next to it.

scatterplot_cubic_relationship2.jpg

In general, the association analysis is a great tool to help understand the relationships in your data (i.e. how your variables correlate) and which variables to choose for predictive models such as regression. In the tool, we have three different methods of correlation. We often get a lot of questions over which to use and what is the difference between the three, thus I’ll go over them briefly.

Pearson product moment correlation

The Pearson method measures the strength of linear dependence between two variables. This means you will see a higher correlation among variables that increase or decrease concurrently at the same rate.

Example:

plinear.png

Strong positive linear correlation

nlinear.png

Strong negative correlation

Spearman rank-order correlation

The Spearman method is a nonparametric version of the Pearson method. It looks at the strength of any monotonic relationship. A monotonic relationship is any relationship where both variables increase or decrease concurrently but not necessarily at the same rate. This includes relationships that are not only linear, but can also be exponential, logarithmic etc.… Another way to think of monotonic relationships is, the rate of change will only stay in one direction, increasing or decreasing. It will never be both.

monotonic.jpg

The two graphs on the left never change direction while the two graphs on the right do change direction and are considered non-monotonic.

At times you may get a good coefficient for a Pearson correlation between two values but an even better one for a Spearman. If this is the case, then it is possible that the relationship between the two

variables is not truly linear. Therefore, we highly suggest that you consult the scatterplot of the data.

Example:

Displ and MPG in this case have a strong Pearson coefficient of -0.85. But looking at the scatterplot, the relationship can be better described by an exponential curve. So after doing a Spearman correlation, we get an even better correlation of -0.91.

scatterplot_cubic_relationship2.jpg

Hoeffding's D statistics

Hoeffdings D statistic is another non-parametric test that is useful for identifying non-monotonic relationships like the ones discussed above.

Now that you know what all three look for, you may be asking “Well, which method should I pick?”

The easy answer, especially if you are not truly sure about your data, is all of them. Knowing how each variablecorrelates will give you a better understanding of what models you want to use and what variable you should or shouldn’t choose for those models. Any time you are doing any predictive modeling you should always be using Data Investigation Tools such as this one before constructing any predictive model.

Things to look out for

Since the Association is an R-based macro usually any errors that come from this tool is almost always a data issue.

Example: If you only feed in 4 or less records into the association analysis tool (you shouldn’t be doing this anyway since it's bad practice) you will get this error:

Error: Association Analysis (39): Tool #9: Error in rcorr(the.data, type = cor.type) : must have >4 observations

If you are getting errors from this tool or any other predictive tool, please see:Troubleshooting the Predictive Tools.

The tables and scatterplots in this article are from the association analysis sample workflow in Alteryx. You can find it if you go to the Help tab->Sample Workflows->Predictive Analytics-> Association Analysis.

If you enjoyed this topic and want to learn more about predictive capabilities within Alteryx, check out theOne Stop Shop for Predictive Resources.

And remember… correlation does not imply causation

correlation.png

By now, you should have expert-level proficiency with the Association Analysis Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at Community@alteryx.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.

Stay tuned with our latest posts every Tool Tuesday by following Alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.

Comments
bobd
8 - Asteroid

Does this "general" tool still exists? Cause in Alteryx developer I only see two of the tools above which have their own tool. 

bobd_0-1666856165100.png

 

lepome
Alteryx Alumni (Retired)

@bobd It looks to me as though you have not installed the Predictive tools (RInstaller or RNonAdminInstall) since you last updated or patched your Designer.  Go to https://downloads.alteryx.com to get the version that matches what you are running in type and (YYYY.R) number

bobd
8 - Asteroid

Thanks @LisaL, this worked

AbiramiJothi
7 - Meteor

Link for correlation does not imply causation doesn't work.