Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alter Everything

A podcast about data science and analytics culture.
Episode Guide

Interested in a specific topic or guest? Check out the guide for a list of all our episodes!

VIEW NOW
MaddieJ
Alteryx Alumni (Retired)

In the second episode of our “Data [in the] Sandbox” mini-series, Susan tells Maddie all about correlation vs. causation... we know it sounds complicated! But it’s actually really cool and important, especially for understanding stats you might hear in the news, or fun facts like the statistic Maddie learned: that shark attack rates go up when ice cream sales rates go up. Does this mean that sharks are jumping out of the water and stealing ice cream?! Are people dumping ice cream into the ocean to share the tasty summer treat with their shark friends?! Tune in to find out!

 


Panelists

 

Maddie Johannsen - @MaddieJ, LinkedIn, Twitter

Susan Sivek - @SusanCS, LinkedIn, Twitter

 


Topics

 


Transcript

Episode Transcription

MADDIE: 00:00

Hey, everyone. It's Maddie Johannsen. Welcome back to our mini-series called Data [in the] Sandbox. [music] In this episode, my friend Susan will tell me all about something called correlation versus causation. I know, I know. It sounds complicated, but it's actually really cool. Plus, Susan made it fun. So let's get started. [music]

MADDIE: 00:29

Hey, Susan. You said this time we'd talk about sharks. So I read a little about sharks, and here's a fun fact I learned today. Did you know that when people buy more ice cream, shark attacks go up? [music] Like more people getting attacked by sharks when ice cream sales go up. Just-- what?

SUSAN: 00:49

Oh. Well, that is very interesting.

MADDIE: 00:53

It's crazy. Like how do the sharks know? Are they leaping out of the water to get ice cream cones from people's hands? That is just bizarre and scary. I don't want to eat ice cream on the beach anytime soon. Oof. Wow. What a sad thought. [laughter]

SUSAN: 01:07

Yeah, that is kind of scary, but have you actually heard of sharks jumping out of the water to steal ice cream from people?

MADDIE: 01:14

Hmm. Well, no, not exactly, but if the ice cream purchases go up and the shark attacks go up, they must be related, right? I wonder which flavor the sharks like best. [laughter] Maybe strawberry would be safe? Because I bet they don't like strawberry, but I don't really know.

SUSAN: 01:30

Yeah. Well, okay, before you get to worried about this, let's look at it from a data perspective.

MADDIE: 01:37

Okay. Yeah, that sounds good. So let's take a moment for me to remember. Last time we talked, we figured out that data is not just what I use to get the internet on my phone, but it's information. All kinds of information for things I could count, or things people have said or written, or even photos or stuff on a map.

SUSAN: 01:55

Yeah, exactly. And you want to be sure that that information is carefully measured and collected so it's accurate. You want to be sure that when you analyze it, you can trust the information.

MADDIE: 02:06

Totally. And we also talked about how I have to come up with good questions based on getting some information first.

SUSAN: 02:13

Yeah. That will help figure out what question you should ask and what information you need to collect. So like how we only looked at our running times and our sex, but we forgot about all the other possible things, like the weather, when we're trying to figure out how to start running faster.

MADDIE: 02:28

Snacks are so good though. So good.

SUSAN: 02:31

Yeah, I totally agree. But there's really just one potential variable that could affect your running.

MADDIE: 02:36

Wait. One potential what?

SUSAN: 02:39

Oh, yeah. One potential variable. So what that means is it's one of the things that could change, or vary, in how you're getting ready to run each day. Some days you have a snack; other days you don't. Another variable might be the temperature outside. How do you think temperature might change your running?

MADDIE: 02:58

Well, I think I'd be slower when it's hot and faster when it's cool.

SUSAN: 03:02

Yeah, me too. Definitely. And we'd probably measure the temperature in degrees, right? Another variable might be which shoes you choose to run in. Running in your nice new running shoes is probably going to be faster than running in flip flops or running in high heels.

MADDIE: 03:19

Ooh, yeah, I'll definitely pass on that. That sounds like a recipe for disaster.

SUSAN: 03:23

Yeah. Ouch. Definitely.

MADDIE: 03:26

[music] Sharks stealing my ice cream; that would also be a disaster. Let's get back to that situation. I need to know how to protect myself and my favorite summer treat.

SUSAN: 03:34

For sure, yeah. That is definitely important. So let's get that idea of a variable in there now. You identified two things that go up and down when you first said, "When people buy more ice cream, shark attacks go up." So what's one of the things that can change in that statement, just something that goes up or down?

MADDIE: 03:55

Well, the shark attacks. There can be more attacks or fewer attacks.

SUSAN: 03:59

Exactly. So let's say that the number of shark attacks is one variable here. What else varies here? [music}

MADDIE: 04:09

I guess the amount of ice cream people are buying?

SUSAN: 04:11

Absolutely. That would be our second variable.

MADDIE: 04:14

Okay. Two variables: ice cream sales and shark attacks.

SUSAN: 04:18

Perfect. Yes. Okay, so you think that right now that ice cream sales are causing more shark attacks. Like you said that sharks somehow know that there's more ice cream out there, and they are just going for it. They are attacking the humans and just getting all the tasty treats.

MADDIE: 04:36

Yes. Sales go up; attacks go up. It's terrifying.

SUSAN: 04:40

Well, let's slow down a minute. Maybe we don't have to be so scared. What if I told you - I'm going to blow your mind here, so hang in there - what if the shark attacks were causing the ice cream sales?

MADDIE: 04:53

What?

SUSAN: 04:54

Yeah. I mean you said they both go up, right? So how do you know it wasn't the other way around? Like the shark attacks go up, people get scared, and so they're like, "I'm just going to relax and eat this nice ice cream on the beach instead of going into the water." So the shark attacks go up and they cause the ice cream consumption to go up.

MADDIE: 05:16

That seems ridiculous.

SUSAN: 05:18

Well, yeah, I agree. But so does the idea that somehow sharks know how much ice cream is being sold and they're just leaping out of the water as the sales go up. Do they have shark spies in the ice cream companies? Sharks sneaking around following ice cream trucks and tracking how many people buy ice cream?

MADDIE: 05:39

Oh, okay. Well, you may have a point there. That does sound kind of unlikely.

SUSAN: 05:43

Yeah. And maybe there's another variable. Something else that changes, that is also changing when those shark attacks go up and when ice cream sales go up. Is there maybe some other big thing, like maybe a weather pattern, that changes when those two variables are also changing?

MADDIE: 06:04

Hmm.

SUSAN: 06:05

So when would you guess that people eat the most ice cream? What kind of weather?

MADDIE: 06:10

Oh, hot weather, like in the summer.

SUSAN: 06:12

Yeah. And when do people probably go to the beach the most and get in the water around sharks the most?

MADDIE: 06:19

Summer, like now. Now's the time to go to the beach. Oh, yeah.

SUSAN: 06:25

So maybe the weather or the season is a third variable here. Maybe we could actually explain the connection between ice cream sales and shark attacks by just realizing that the outdoor temperature rising is also something those two patterns have in common.

MADDIE: 06:41

That makes sense, I guess. So the weather gets hotter, people want more ice cream, and shark attacks go up because more people are at the beach.

SUSAN: 06:49

Yeah, yeah. You know, we have a special name for this idea in data analytics. When we see that two things are related, like they go up and down together, like the temperature and the ice cream sales, we call that correlation. So temperatures and ice cream sales are correlated, we could say. They both go up and down together during the year. So higher temperatures, more ice cream; lower temperatures, less ice cream.

MADDIE: 07:16

So higher temperatures make people eat more ice cream? All right. I guess that makes more sense.

SUSAN: 07:21

Well, not exactly. I mean, do you feel like you have to eat more ice cream in the summer?

MADDIE: 07:27

Yeah, absolutely I do. [laughter] Well, I mean, no, I guess i don't have to. [laughter]

SUSAN: 07:33

Right. Well, the higher temperatures don't necessarily make you eat more ice cream. Or we might say they don't cause you to eat more ice cream. And this is a really important idea in data analytics. Just because those variables are correlated, like temperatures and ice cream sales, that doesn't mean that one caused the other.

MADDIE: 07:55

Okay. So let me say that again just to make sure I get it. Variables-- variables-- blech. [laughter] Variables can be correlated, but that doesn't mean that one definitely caused the other to go up or down.

SUSAN: 08:14

But we figured out that the summer temperatures are another possible connection with ice cream sales going up and that hotter weather sending more people to the beach means maybe shark attacks go up. So there's a different variable, the weather, that could explain this connection between ice cream sales and shark attacks.

MADDIE: 08:34

That probably makes more sense than the idea that sharks understand ice cream sales, I guess. But it maybe is a little less interesting. [laughter]

SUSAN: 08:42

Yeah. Sorry about that. It is a little less dramatic and weird. But this is another good thing about using data to figure things out. You were kind of worried about the sharks stealing your ice cream, right? Now you don't have to worry.

MADDIE: 08:55

Yeah, that's great. Hooray for data, helping me feel like my ice cream is safe.

SUSAN: 09:00

Well, now remember, it's not safe from hot weather. It will melt. You have to be careful.

MADDIE: 09:06

Ah. Okay, then. [laughter] So we have figured out now what variables are, things that can change, and that sometimes they do change together, like ice cream sales and shark attacks. But, as we talked about, that doesn't mean that one of those variables caused the other. They're just correlated.

SUSAN: 09:24

Exactly. This is just like the example we talked about in our last episode, if you remember. You noticed that if you ate a snack, you ran faster. So the snack and your speedy runs were correlated, but we also realized that it might have just been that you were less stressed or the weather was cooler or maybe you were wearing better shoes and that was why you ran faster. There were other potential causes for the faster running than just the snacks.

MADDIE: 09:53

I see how that works. It seems like it's really important to see how sometimes things can be connected but how one isn't actually making the other one happen.

SUSAN: 10:01

Absolutely, yeah. And if you can keep this one concept in mind, that correlation doesn't prove causation, that correlation alone doesn't show that one thing made another thing happen, you will be so far ahead of lots of people who misunderstand this kind of information. So often, you hear people say things in everyday life where they are claiming that one thing must have caused another, but really they're just seeing two things, two variables, that were correlated.

MADDIE: 10:32

So [music] when someone tells me I should eat vegetables because they'll make me healthy, I can say, "Nah, you just think vegetables are causing me to be healthy, but they're only correlated"?

SUSAN: 10:42

Well, nice try. But no, not exactly. There's actual research, real scientific studies, that carefully analyzed how vegetables can actually cause you to be healthier. That's a whole different topic we can't get into here, but you'd have to do actual research to show how one things causes another instead of just being correlated. So, I'm sorry, but you still have to eat the veggies.

MADDIE: 11:05

All right. Well, at least I can feel better about the ice cream not bringing on a shark attack now.

SUSAN: 11:10

True, true.

MADDIE: 11:11

Cool. So we've covered now that data is information in lots of different forms, that I need to collect it and analyze it carefully, and that things that change, variables, can be connected or correlated but not necessarily cause each other to change.

SUSAN: 11:27

Right. Definitely. And also that when you're thinking about what data you need, like when you need to know about running or about shark attacks, you'll also want to get some good information from experts on what you're curious about. That way you can ask the right questions and get the best information to answer those questions.

MADDIE: 11:45

Awesome. I really feel like I'm getting a handle on what data is and how I really have to think about it carefully. It's kind of like solving a puzzle, and I can find all kinds of questions that I can try and answer with data like this.

SUSAN: 11:58

Totally. Yeah. It takes a lot of effort, but there are so many interesting and important questions in our daily lives that we can think about with data. Everything from "Should I eat a snack before I run?" to "Which movie should I watch tonight?" to "What's the best place to put our town's new community pool?" to "What kind of educational system should our whole country have?" It's everything from the little questions, that are still really important, to the biggest, grandest questions that we need to answer as humans. It's actually very cool.

MADDIE: 12:32

Yeah, definitely. Sounds really, really cool. So, all right. [music] Next time, do we finally get to talk about psychics?

SUSAN: 12:38

Well, sort of. We'll talk about making predictions next time and how we can use data to try to figure out what might happen in the future.

MADDIE: 12:46

Can't wait.

 

[music]

MADDIE: 12:50

Thanks for listening to Data [in the] Sandbox. This mini-series was written by Susan Currie Sivek, and our theme music is by Andy Uttley. If you know a K-12 educator or student or are one yourself, we're excited to offer a new learning and certification program designed for kids and young adults. To sign up or learn more, visit alteryx.com/forgood. That's alteryx.com slash F-O-R-G-O-O-D. Catch you next time. [music]

 

This episode of Alter Everything was produced by Maddie Johannsen (@MaddieJ) and @TreyW.
Special thanks to @SusanCS for writing this episode, @andyuttley for the theme music track, and @jeho for our album artwork.