cancel
Showing results for
Did you mean:

# Tool Mastery | Pearson Correlation

Alteryx
Created on

This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Pearson Correlation Tool on our way to mastering the Alteryx Designer:

When you are investigating a new dataset, you might be interested in measuring the correlation between different variables. There’s two different correlation methods available in Alteryx under the Data Investigation tab:

• Pearson Correlation: Indicates the strength and direction of a linear relationship between two variables
• Spearman Correlation: Assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any other assumptions about the particular nature of the relationship between the variables

Pearson Correlation

We’ll dive into the Pearson Correlation tool in this article. It is the most frequently used correlation measure in practice. If someone tells you the “correlation” between two variables without specifying the method, they’re usually talking about the Pearson method.

Before using the tool, you’ll want to make sure the variables you’re analyzing are numeric (ints, floats, and doubles all work fine). Also, make sure you don’t have nulls in the variables you’re analyzing.

It is usually a good idea to look at a scatterplot of your data to make sure that a linear relationship looks like a reasonable assumption. Pearson correlation isn’t a good choice if your data looks to have a quadratic, logarithmic, or other non-linear relationship.

If you’ve decided to use the Pearson Correlation tool, the good news is it’s a pretty simple tool to configure. You really only have two choices to make.

1. What variables do you want to calculate correlations for?
2. Do you want to calculate correlations or covariances?

The tool will generate correlations between all combinations of variables you specify so in the example above, we’ll actually be calculating 9 correlations and get a correlation matrix as our output.

The Pearson Correlation tool can also calculate covariances if you’d prefer. Think of covariances as an “unstandardized” correlation. It’s still a measure of the relationship between variables, but it’s not adjusted for the variance (i.e. “spread”) of each variable.

If you calculate correlations, you’ll get values between -1 and 1 as your output. There are different philosophies to determining whether your correlation is weak, moderate, or strong, and it depends on your use case. But, it’s a good rule of thumb to think of magnitudes under 0.3 as weak, 0.3 to 0.6 as moderate, and over 0.6 as strong, with the sign signifying a positive or negative relationship.