Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Tool Mastery

Explore a diverse compilation of articles that take an in-depth look at Designer tools.
Become a Tool Master

Learn how you can share your expertise with the Community

LEARN MORE

Tool Mastery | Pearson Correlation

DaveF
Alteryx Alumni (Retired)
Created
Pearson Correlation.png

Thisarticle is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Pearson Correlation Tool on our way to mastering the Alteryx Designer:

When you are investigating a new dataset, you might be interested in measuring the correlation between different variables. There’s two different correlation methods available in Alteryx under the Data Investigation tab:

  • Pearson Correlation: Indicates the strength and direction of a linear relationship between two variables
  • Spearman Correlation: Assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any other assumptions about the particular nature of the relationship between the variables

Pearson Correlation

We’ll dive into the Pearson Correlation tool in this article. It is the most frequently used correlation measure in practice. If someone tells you the “correlation” between two variables without specifying the method, they’re usually talking about the Pearson method.

Before using the tool, you’ll want to make sure the variables you’re analyzing are numeric (ints, floats, and doubles all work fine). Also, make sure you don’t have nulls in the variables you’re analyzing.

It is usually a good idea to look at a scatterplot of your data to make sure that a linear relationship looks like a reasonable assumption. Pearson correlation isn’t a good choice if your data looks to have a quadratic, logarithmic, or other non-linear relationship.

If you’ve decided to use the Pearson Correlation tool, the good news is it’s a pretty simple tool to configure. You really only have two choices to make.

  1. What variables do you want to calculate correlations for?
  2. Do you want to calculate correlations or covariances?

img1.png

img2.png

The tool will generate correlations between all combinations of variables you specify so in the example above, we’ll actually be calculating 9 correlations and get a correlation matrix as our output.

img3.png

The Pearson Correlation tool can also calculate covariances if you’d prefer. Think of covariances as an “unstandardized” correlation. It’s still a measure of the relationship between variables, but it’s not adjusted for the variance (i.e. “spread”) of each variable.

img4.png

If you calculate correlations, you’ll get values between -1 and 1 as your output. There are different philosophies to determining whether your correlation is weak, moderate, or strong, and it depends on your use case. But, it’s a good rule of thumb to think of magnitudes under 0.3 as weak, 0.3 to 0.6 as moderate, and over 0.6 as strong, with the sign signifying a positive or negative relationship.

If you’d like to experiment with the data used in this article, download the attached workflow!

By now, you should have expert-level proficiency with the Pearson Correlation Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know atcommunity@alteryx.comif you’d like your creative tool uses to be featured in the Tool Mastery Series.

Stay tuned with our latest posts everyTool Tuesdayby followingAlteryxon Twitter! If you want to master all the Designer tools, considersubscribingfor email notifications.

Attachments