Data Science Mixer

Tune in for data science and cocktails.
MaddieJ
Alteryx Community Team
Alteryx Community Team

How can a well-designed data visualization lead to deeper understanding of data? Data visualization expert Alberto Cairo joins us for an extended conversation about creating, revising and enjoying thoughtful data viz.

 

 


Panelists

 


Topics

 

 


Cocktail Conversation

 

Alberto talked about how data visualizations help us see things we otherwise couldn't see in our data such as trends and patterns that might not be visible when your data are in a tabular form, but that jump off the screen when you make a visualization. Have you experienced a moment when a visualization helped you see something otherwise unseen in your data, maybe with important or useful consequence?

 

Join the conversation by commenting below!

 

Alberto extended CC.png

 


Transcript

 

Spoiler

SUSAN: 00:00

All right, so we are rolling. On Data Science Mixer, we try to bring you people and ideas that will change the way you do data science. In today's interview, my second with data visualization expert Alberto Cairo, I think we fulfill that promise. I'm Susan Currie Sivek, the data science journalist for the Alteryx community. Today, I have a second chat with Alberto for you, following up on our first episode that we recorded for the Alteryx Virtual Global Inspire conference. We explore still more fascinating ideas around doing data viz successfully, including: how to display uncertainty, why you might need a user guide for your visualizations, and how to revise visualizations effectively to improve them. We even get into visualizations in games and why taking up art could make you even better at data viz. If you didn't hear our first conversation, you should definitely check it out after this episode. But it's not like a prerequisite for this one. Let's meet Alberto again and jump right in.

ALBERTO: 01:08

So I am Alberto Cairo. I am a he, and I am a professor of visualization, information design, information graphics, explanation graphics at the University of Miami. My title is very long. So it's Knight Chair in Visual Journalism of the University of Miami and Director of Visualization at the University of Miami Center for an Institute for Data Science and Computing. So that's a very long title.

SUSAN: 01:35

Excellent. It's an awesome title.

ALBERTO: 01:36

So I usually say I am a-- yeah, it's a great title, as is common in academia, but I usually say I'm just someone who makes graphics to inform the public, both data visualization and also pictorial graphics or visual explanations. So I make those for a living, but I also teach how to how to design them. That's what I do.

SUSAN: 01:58

And you've written a few books on the topic. Could you tell us a little bit about those?

ALBERTO: 02:02

Sure. Yes. I've written several books about-- more specifically, data visualization. The latest one is How Charts Lie, which is my first book for the general public. And despite the title, it's not a book about how to lie with charts, it's the opposite. It's about how not to lie with charts. And I usually joke also that the title of the book could have been not "How Charts Lie," but, "How Do We Avoid Lying to Ourselves with the Charts That We See Everyday." So it's a book about how-- I mean it certainly-- I certainly cover how bad actors misuse charts to mislead the public. But the book is mostly about how any reader, and that includes myself, can take a graphic that is otherwise perfectly designed or designed with the best of intentions and can still misinterpret it just because we all tend to project our own opinions and values on the charts that we see every day. So it's a book that warns about that. And prior to How Charts Lie, I also wrote a couple of books more for professionals who want to produce this type of work. The other one is for example, The Truthful Art, which is the one that precedes a How Charts Lie. And then the first one is The Functional Art.

SUSAN: 03:22

Yeah, fantastic books that you see quoted all over the place and used as references all the time. So great stuff. What was the difference for you in writing for a more general audience about data visualization versus in your previous two books, which are more for experts in the field?

ALBERTO: 03:38

Well, in my case, there was really not a huge difference, mainly because the previous books prior to How Charts Lie, the book for the general public, I wrote them with journalists in mind, not necessarily with data scientists in mind, but with journalists. And unfortunately, we journalists are usually not very well-trained on statistics and data science, etc. So I needed to explain things as clearly as possible just because I was talking to an audience that is quite general, right? It's only that, particularly The Truthful Art, funnily, even if it is a book that I wrote for journalists and graphic designers to teach a little bit about descriptive statistics and how to visualize them, it ended up being picked up by data science programs to teach elementary data visualization, and elementary statistics to data sciences units; which is quite flattering, honestly, for someone who is not a data scientist or a statistician that professionals in the field think that the book is still valuable. How Charts Lie is more again for the general public. So I needed to spend a little bit more time explaining things, perhaps a little bit more clearly, not taking a lot of things for granted. But certainly there was not a huge difference. I'm not a very academic academic, so to speak. My mind, my style of writing is very casual. It's very informal. And that has not changed in any of the books.

SUSAN: 05:10

That's funny to hear. As a former academic, I definitely appreciate that comment.

ALBERTO: 05:15

There were actually there-- I have this anecdote, one of the reviews that appears on Amazon of my previous book of The Truthful Art, the one that I said was adopted by some data science programs written by an actual data scientist, by a biostatistician, who wrote this review saying, "This book," meaning The Truthful Art, "is in no way mathematically rigorous," he said. He is beginning saying that, "But the explanations are so clear that I'm still going to make this book part of my curriculum because it's a good starting point to understand the concepts at a very general level. And then once you have understood the concepts, then I can give you all these other readings that will teach you about the concepts from a more rigorous perspective," mathematically rigorous prospect, so to speak, right?

SUSAN: 06:04

Yeah, that's awesome. So you've written about the potential of data visualization for telling stories and communicating information in these different texts we've been talking about. What are some ways that you see that huge potential being achieved today? Are there some specific data viz examples that you see as kind of moving us toward that direction of doing really great communication with visualization?

ALBERTO: 06:29

The famous statistician John Tukey; who is widely considered one of the sort of like the founders of modern visualization, besides being the creator of several, many statistical methods that are still being used today; he has a book about visualization for exploration titled Exploratory Data Analysis, written in the '70s. But he used to say that, "The greatest value of an image," a visualization, is in some sense an image, "is that if it is well-designed, it will force us to see things that we otherwise we would not see." And he said, in the book he wrote, "Whenever you are analyzing data, trying to understand your data, always visualize your data, because if you don't visualize it, there will be properties in the data that may go unnoticed, certain patterns, certain trends that you might not detect if you don't transform the data into a visual." So that's the great power of visualization.

ALBERTO: 07:28

The great power of visualization is that when we care about the big patterns, the bird's-eye view of the data that we're handling more than we care about the specific data points a visualization comes in handy. That's the greatest power of visualization When we care about the specific values of the specific quantities being represented in the data. Then nothing beats a table. The actual data set presented in tabular format. But if we care about patterns, then visualization is great for that. For examples of visualization being sort of like transformative or influential, etc., these days in presentations I'm saying that I think that the pandemic has been, the COVID-19 pandemic has been transformative at showing actually the power of visualization to communicate clearly. Also took to miscommunicate though, to mislead the public as well, right, because visualization has also been misused during the pandemic. But we are in a new era of data visualization, of the popularization of data visualization. That's, I guess, one of the silver linings of all the horrors that we have experienced throughout the pandemic. Visualization is becoming a little bit more popular.

SUSAN: 08:42

Yeah, absolutely. Absolutely. Are there any particular examples you would point to as shining examples of data visualization out of the pandemic that you think were particularly informative or persuasive?

ALBERTO: 08:51

Yeah. I think, yeah, for example, that the work of, in general what we would call elite news media in the countries that I'm most familiar with; being Spain where I'm from, Brazil where I worked for for a few years, the United States, some countries in Europe, the UK, etc.; in general, the work of news media has been, I think, particularly big news media, has been quite good, in general. I mean, I think, for example, about the work of the Financial Times, for instance, particularly the work of a journalist called John Burn from the Financial Times data team, I think that it has been exemplary. And then big media in the US, the New York Times, The Washington Post, Wall Street Journal, ProPublica, I mean, they have been doing work that is absolutely, absolutely amazing. And in some cases creating stories based on visualization that have quickly become, I mean, the most popular content that they have ever published. I have many anecdotes related to these, but the most popular story ever published by The Washington Post online is a data visualization published during the pandemic. It's a simulation of what happens in a population if a country deals with a pandemic or not. That story that was published at the beginning of the pandemic, most popular story ever published by The Washington Post.

SUSAN: 10:08

Amazing.

ALBERTO: 10:09

And something similar happened in Spain, for example, with El Pais, which is the most widely read newspaper in Spain. The most popular story that they have ever published is an explanation graphic, an infographic related to the pandemic as well. So [I have?] plenty of examples like that everywhere.

SUSAN: 10:26

Awesome. Yeah, we'll be sure to find some of those and link in the show notes so folks can check them out. So certainly for these pandemic visualizations and other visualizations that we encounter, one of the challenges in creating them is dealing with uncertainty, right, if we don't know for sure what the outcome of a situation might be. You talk about this a bit in How Charts Lie and how we can handle uncertainty in data visualization without misleading people. So could you talk a little bit about that, what some of the difficulties are and what may be your recommendations would be for dealing with uncertainty in data viz?

ALBERTO: 10:59

Well, that could be the topic of an entire new book or a series of books, right? And there are--

SUSAN: 11:05

I would read it. I would read it.

ALBERTO: 11:07

Yeah. I would read it, too. Or I would write it, I guess at some point.

SUSAN: 11:11

Yeah, please do.

ALBERTO: 11:12

Although there are people who are far more qualified than I am a to write about uncertainty in visualization and I could mention their names. So for example, at Northwestern University you have a Jessica Holleman who does research and writes extensively, constantly about the communication of uncertainty through visualization; or Matthew Kay, who's also part of Jessica's group over there; all of them. I mean, they are absolutely, absolutely wonderful [inaudible] [Pareja?] she's another researcher, also works in this area. The challenge I would say though, is that merely showing the uncertainty is not enough. Particularly, we need to remember that the challenge with communicating uncertainty with the general public is that people don't really know what uncertainty is, right? The public, in general, has this very black-and-white, yes-or-no, true-untrue, view of science, right? Because the type of science that we learn in school is what we could call the settled science, right? Evolution by natural selection, that has been corroborated over and over and over again. Never refuted up to this point. Therefore, we could call it settled science. Gravity. While we teach gravity because the theory of gravity has never been refuted. I don't think that we will agree, but it could be, right? That's settled science. Very little uncertainty. But most science is not like that, right? Most science being produced is very uncertain, very, very uncertain. And that's how it is supposed to be, because science is a set of methods that ideally lead to a self-correcting process, right? And that involves certain uncertainties, etc. Any data set that is being produced will have some certainty surrounding all the point estimates that we make and so on and so forth. All scientists know, right, what do we talk about when we talk about uncertainty, but the public doesn't.

ALBERTO: 13:10

And that's the main challenge. The challenge is not visualizing the uncertainty, although we need to do that more, right? It would be great if we journalists visualize confidence intervals, for example, when reporting about surveys and polls, if we visualize the confidence interval in there, I think that it would be super useful to make the public more aware of those uncertainties and to find an opportunity to, and here comes the key, to explain what that uncertainty means. It's not just about visualizing it, it's about explaining what it means, right? So the public will get more used to it. That's where the challenge lies. It's not only the visual representation of the uncertainty. It's explaining when it is relevant, when it is more relevant, when it is less relevant, why it is being visualized and what it actually means that we are showing that in the graphic, right?

SUSAN: 14:02

Yeah, yeah. That's really interesting. And I can imagine people too, who are in business settings, who are probably a large portion of our audience listening to you talk about trying to explain uncertainty, trying to communicate what that all means, end up having to do that in a business meeting with an audience of people who may or may not understand what that concept means. I'm sure you've encountered--

ALBERTO: 14:21

Or don't care that much about the nuances in the data and the exceptions in the data, right?

SUSAN: 14:26

Right.

ALBERTO: 14:27

Any communication of data involves the systematic but careful reduction of complexity. And the key is always trying not to reduce too much. That's the key thing, right? There is a quote that is commonly attributed to Albert Einstein. I don't know whether it's apocryphal or not - it's still a great quote - that goes, "Everything should be made as simple as possible, but not simpler." That's the key thing. So that's the way that I usually approach things, such as, for example, should I show the uncertainty and explain it or should I hide it? Well, it depends on what the message is because there are certain types of messages in which the uncertainty is key, and sometimes it is secondary.

ALBERTO: 15:15

Like the example that I usually put to illustrate all this in my classes, particularly for students who are not scientists is to say, if you're reporting the results of a survey in which-- an opinion survey, "Do you care about this or you care about that," yes or no, for example, right? And you get a result that is, for instance, 51% of people say, "Yes," 49% of people say, "No," right? That's one of the cases in which I would show the uncertainty. I will show the confidence interval. Because if the margin of error is, let's say three points around the point estimate, then the uncertainty's part of the story, because you cannot say that one number is bigger than the other, all that you can say is that they are tied. But if the results of the survey were much wider, the differences were, much, much wider than the margin of error, let's say that you have 80% versus 20%, then showing the uncertainty, I mean, it can be done, you can still show it, but it is not as critical as it was in the previous case. That's what I mean by making the things as simple as possible, but not simpler, right? There are cases in which you want to show it. You need to increase the amount of information that you show, in this case the uncertainty of the-- in the graph. But there are cases in which that is secondary. It can still be shown. But if you show it and it overcomplicates the message and it is not crucial to make the message clearer, then you can just basically rule it out, at least for that specific case, right?

SUSAN: 16:42

Yeah, yeah. And we talked about this a little bit in our previous conversation, too. But it just reminds us, again, how much of this is about making those judgment calls. How much information do people really need? What is the overarching story here? A lot of it is, I don't want to say subjective because that's a whole nother issue that comes up with this subjective [crosstalk]--

ALBERTO: 16:58

No, no. It is subjective. I'm sorry to interrupt you in it.

SUSAN: 17:02

No, please.

ALBERTO: 17:03

It's only that it's-- I never see things in black and white. I never see things as objective-subjective. Things are more subjective or more objective, right? So is it a subjective decision? Sure, everything--

SUSAN: 17:14

But that's so complicated Alberto.

ALBERTO: 17:17

It is complicated, right? But when I teach visualization, one of the things that really drives students crazy is that I usually begin the [semester?]-- or classes, any class that I teach, even for professionals. I was teaching one this morning, and I began by saying, "This is a craft. So I am not going to teach you rules. I cannot teach you rules. What I can teach you and demonstrate is a reasoning process that may lead you to make good choices, better choices, rather than worse choices. You can still make the wrong choice, but if you follow this methodology, it will be more likely than not that you will make good choices rather than bad choices." So things are indeed complicated, but visualization a little bit like writing. There's not really rules to writing besides respecting the symbols and the grammar of the written language, right? Besides that, beyond that, it's very subjective, right, the decisions that you make. Visualization is a little bit similar to that.

SUSAN: 18:15

That's a really interesting comparison. You're making me realize that indeed that is like the writing process. You have advice that you give people on how to survive that process and make it efficient and effective. Because certainly, when it comes to writing revision, most people are not really fans of that process. So when we're doing it with data visualization are there particular tips you would give to people to make that work?

ALBERTO: 18:38

Yes. So I have a specific advice, which is that-- and it's the same advice that I will give anyone who wants to write a book. Writing a book, the same way that designing a visualization, it's easy. I mean, I can't write a book in three months, but the first version of what you write is going to be complete crap, right? It will be really, really bad, complete crap.

SUSAN: 18:58

Been there.

ALBERTO: 18:59

It will only get better. Yeah, you know that, right? You've been there, we've all been there, right? The first version of what you write is terrible. The same thing that the first version of a-- the same way that the first version of visualization is really not that great. So the editing process is sometimes painful. Or the process of getting visualization critiqued and reviewed by other people in a meeting, it can be painful, it can be difficult, but you just need to assume that that's part of the process. It's not an addition to the process, it's part of the process. The process of getting a critiqued getting it reviewed, getting it tested, f you can, because that's another thing. The importance of testing our visualizations with actual people. That's painful and it's a humbling experience. Even people like myself who are have been in this field for more than 20 years, I get things wrong all the time. All the time. And the only way that I can be made aware that I'm wrong about something is by testing it, by putting it out there and see how people respond. So I mean, it may sound-- what I'm saying may sound a little bit trivial or banal, but it is not. I mean, it's part of the process. So writing is not writing. Writing is not writing and designing is not designing. Writing is writing, editing, testing, going back to the drawing board and write again. And visualization is that way.

SUSAN: 20:18

Tell me a little more about the testing process. What kinds of things do you ask people to do when you hand them a draft of a visualization and ask for their feedback?

ALBERTO: 20:25

Yeah. So I don't do formal testing myself because I'm not a human factors, human-computer interaction researcher. I have colleagues and friends here at the university who are experts on that. So whenever I need to do formal testing, I just collaborate with them because they are the ones who are experts at this. But there is still some informal testing that we can all conduct with our visualizations. And that is as simple as, design your graphic, get a group of people who you may think that are representative of the type of audience who are going to consume your dashboard, your visualization, or whatever it is, on a regular basis. Sit them in front of a computer. Have them read their graphic, don't bias them, have them read the graphic. And five minutes later come back to them and ask them, what did you learn? And if the responses to that question match what you are intending to communicate, then you know that you have a good piece on your hands. But if they don't match, you know that it is time to go back to the drawing board and try it again. That is not that different, by the way, to formal testing. Those are the types of testing that are conducted that way. It's only that they can also be done in a more informal manner if you don't have the time or the resources or the budget to do formal testing of your visualizations. It's still useful.

SUSAN: 21:40

Yeah, absolutely. And I like your point about not biasing them before they look at it. Don't say, "Hey, come look at my awesome, perfectly clear--".

ALBERTO: 21:44

Yeah. Don't explain it to them. Yeah, exactly. "The legend means that." Although I would say, by the way, that another thing that-- so what I have just said actually is another recommendation that I give people sometimes, particularly people who design business dashboards, for instance, with multi-section displays of data, company data, whatever. Those types of graphics, we need to approach them more as if they were analysis tools. And the same way that when you are going to use a physical tool you need to read the instruction manual in order to use the tool, why don't we have an instruction manual for a dashboard, meaning that you sit with the people who are going to use that dashboard on a regular basis and you train them on how to read the dashboard, right? That explanation can be invaluable when you're putting a new product out, particularly the product is as complex as an interactive dashboard.

SUSAN: 22:38

Yeah, that's interesting. And it reminds me of your comment in our last conversation about visualizations as tools for reasoning. So it sounds like you're kind of advocating for mentoring people who are going to use this a little bit in the type of reasoning that they might do using that dashboard.

ALBERTO: 22:53

Exactly. You just point it out. You give them clues of how to use that visualization, right? Pointers. Entry points to the visualization. Or you explain the interaction of the visualization because we all have these sort of like-- there are many, many myths surrounding visualization, such as the ever-present, "A picture is worth a thousand words," or, "Show me the data," or, "Show. Don't tell," right? If you need to tell something, then the visualization fails. Well, all these sayings, all these things might be true in the case of simple visualizations. A simple visualization should be as intuitive as possible. But when it comes to more complex visualizations of data, particularly if they are interactive, multi-dimensional, zoom-able, whatever, those are tools and they may require some explaining in order to be used correctly, and in order to extract the right inferences from them. So why not explain them? They will not be self-explanatory. Some people may be led to the wrong conclusions if they don't know how to use tool well, so why not explain it?

SUSAN: 23:58

Yeah, yeah. Absolutely. I think that's a really interesting point. And of course, it's becoming easier and easier to build interactive or more complex kinds of visualizations and dashboards. I'm curious what, generally speaking, you are excited about in the future of data visualization, anything that you might also have a little trepidation about, things that you're looking forward to.

ALBERTO: 24:19

Well, both my sort of like my excitement and my concerns are related to each other because they are both related to the popularization and democratization of visualization. So I am a big advocate for that. I think that visualization, as you said before based on our prior conversation, I think that visualization is a great tool for exploration, communication, understanding, and reasoning. It's a fantastic tool for all that. So it expands our perception, our cognition. It lets us see things that we cannot normally see, as John Tukey used to say. And at the same time, it's a language for all those purposes that I greatly believe that anybody can learn. It's not that difficult. It's not magic. If we learn how to write and you can learn how to visualize, not only visualize data, but information in general. Information design, which is a much broader field than data visualization, that also can be taught. I teach it, how to design information, how to tell a story visually. For example, like in a comic book like panel-based presentation, step-by-step, right? That would be an example of-- one example of many of the information design. So on one hand, I'm excited by the fact that more and more people are aware that this is a professional field, but it is also a language that can be used by anybody. And I think that, again, that the communication during the pandemic has made a substantial portion of the public be aware-- not only the pandemic, but also the 2016, 2020 election, right? Visualization also acquire a huge presence in news media. So it's being used more and more and people are getting used [inaudible]-- even in video games. I mean, video games have data visualization nowadays, right?

SUSAN: 26:04

That's true. That's true. Yeah.

ALBERTO: 26:06

Huge, huge area of growth. I'm super excited about all that. I want everybody to embrace visualization, everybody to learn visualization. To learn how to read it and how to use it. At the same time that also opens the door to huge amounts of misunderstanding, misinformation, and so on and so forth because visualization can be used for good or for evil similarly to many other tools. So we need to also, as a society, become aware of that fact and become more literate and more ready to deal also with a potential flood of misinformation in the form of bad visualizations or misinterpreted visualizations.

SUSAN: 26:47

So Alberto, are you playing a lot of video games with visualizations in them? Is that where you're encountering these?

ALBERTO: 26:51

Not really. I mean, I don't play many games. I played plenty of board games, tabletop board games, and some of them are shaped as visualizations, right?

SUSAN: 27:03

True. True.

ALBERTO: 27:04

Yeah. So thinking about our game, for example, title Terraforming Mars, which is exactly about that. It's about--

SUSAN: 27:11

That sounds--

ALBERTO: 27:12

And the table, the game, the board of the game, it looks like a dashboard. Levels of oxygen, levels of greenery, some oceans that you can create on Mars, etc. It's like a data dashboard of Mars, right?

SUSAN: 27:28

Wow.

ALBERTO: 27:28

So yeah, yeah. It's amazing.

SUSAN: 27:30

That's cool. I'm going to have to check that out. That's actually a really good transition to something else I wanted to ask you about, more on a personal level. Following you on Twitter, I see that you do a fair amount of drawing on your own and just as an artistic pursuit for yourself. And I'm curious how that practice has influenced your interest in and your thinking about data visualization.

ALBERTO: 27:54

Yes. So I began my career in '97. And I didn't begin as a data visualization designer. I began as an information designer. And information design certainly includes data visualization, the visual depiction of quantities of numbers, but it also involves explaining things visually in general. So how a machine works, how a car works. There is an accident somewhere and you explain how the accident happens. That involves doing an illustration, right, of the road and the car, etc. So I began my career producing more of that type of graphic, more pictorial representational, more graphic. So I know how to draw a little bit, as you saw. So I've always enjoyed--

SUSAN: 28:39

Oh, I would say more than a little bit.

ALBERTO: 28:42

Well, because I've been practicing quite a lot lately.

SUSAN: 28:42

That's impressive. Yeah. Yeah, that's awesome.

ALBERTO: 28:44

I have been recovering my skills. Those are long-lost skills.

SUSAN: 28:47

That's perfect.

ALBERTO: 28:47

I used to be relatively good when I was a teenager, and then I abandoned those skills a little bit when I transitioned towards data visualization. So for like a decade and a half, I didn't draw anything. But then I started recovering those skills little by little. And I'm really enjoying the process. I draw during meetings. So that's one way to keep my hands occupied so I would not be tempted to click on Twitter or something. When I'm in a meeting, I have my hands busy drawing something. And it really helps me keep my attention focused on the meeting. Other people prefer to do sort of like random doodles or geometric drawings. I do actual illustrations. I make actual illustrations when I'm in Zoom meetings. So it helps me relax. It helps me concentrate. It helps me-- it's almost like an activity that puts me in a state of flow. Are you familiar with the book Flow, right?

SUSAN: 29:38

Yeah, yeah.

ALBERTO: 29:39

So that book describes this is a state of mind in which you are so immersed in an activity that you're almost in a state of complete relaxation and mindfulness, not thinking about your own thoughts, right, but completely focused on something that your hands are doing. So that's a state that drawing puts me into. Visualization sometimes does that, right? So when I design a graphic, I may get so into it that I would completely forget about the outer world, yeah.

SUSAN: 30:08

That's awesome. Would you, not that drawing has to have any kind of practical application by any means, but would you recommend doing some sort of artistic practice to people who are trying to do better data visualization? Do you see connections there?

ALBERTO: 30:23

Yes, absolutely. I mean, I would not say that you need to become an illustrator or a painter or whatever, but learning about graphic design, visual design can really help, will really help. So learning about how to use color effectively, learning about composition in art and how to put those things in practice. Not just reading about them, but actually practicing, drawing, that really helps people. And learning how to sketch things out effectively. How to, for example, create a diagram of ideas like an idea diagram or a mental diagram or whatever. That's all related to design, right? And it's an artistic process, obviously, but it's also a process that will help you reason better about things, right? The same way that-- I usually tell people-- and I actually have just this conversation with one of my Ph.D. students over here, who was trying to explain to me an idea that she has for her dissertation. And she couldn't explain the idea clearly. And I said, if you cannot explain the idea clearly to me, if you cannot write that idea clearly, it's probably because you don't understand the idea. One of the best tests to know whether you understand something is if you are forced to teach it or to write about it. If you can explain it clearly it's, probably because you understand it clearly as well. So drawing is a little bit like that. If you can draw something clearly like a scheme or a diagram of an object of the insides of a machine or whatever, and you can draw that accurately and with all the specifics, it's probably because you understand the inner workings of that machine really well. So drawing is also useful to test your own ideas, to reason about your own ideas. And again, learning a little bit of art is also useful for other reasons. It's like composition, layout, etc., balance, hierarchy. All those concepts that are borrowed from the world of visual design are useful for database organization for sure.

SUSAN: 32:23

Yeah, absolutely. And I love what you were just saying there. I think the idea of drawing out your ideas and interacting with them that way, it makes me think of doing exploratory data analysis but on your own thinking, right? Being able to actually diagram it out and see what's in there, what's actually in your thoughts. That's cool. So one thing that I always ask our guests on Data Science Mixer is - we call this the alternative hypothesis - what's something that people often think is true about working with data or about data visualization, but that you, in your experience, have found to be false?

ALBERTO: 33:00

Well, I know that this is going to sound obvious to most of your listeners, but I still stumble upon this problem all the time. The fact that people tend to believe that it is the data that matters. And it is not the data that matters. It's like the founder of the TED conferences, Richard Saul Wurman - who now lives in Miami by the way, he's a friend - he likes to say, "People care a lot about big data. But what we really need is not big data is big understanding." So that is different. So the fact that data sometimes tends to become a goal on its own, right? In philosophy-- and I love reading and learning about philosophy, that's why I once dreamed about pursuing a career in epistemology, right? People in philosophy tend to talk about instrumental goals or instrumental optics of thought, or instrumental goals and final or ultimate goals, right? Data and the tools I use to manage, to handle, to explore, to analyze, to visualize the data, that's not an ultimate goal. That's an instrumental goal, right? It's an instrument. It's an instrument. It's a machinery. It's a device that we use. A tool that we use to achieve something higher. And that higher is clear communication or understanding, right? So what matters in data is not the data, it's the interpretation of the data. And to find an interpretation of the data that is rigorous, that corresponds to reality, and at the same time, that may be useful, right? So that's one of the-- I would say that that's one-- the most common misunderstanding, I think. Yeah.

SUSAN: 34:39

No, I think that's a great point. Anything that we haven't talked about that you want to be sure to get in there or anything else that you want to say about data viz, data science?

ALBERTO: 34:49

Yes. I would say that when it comes to visualization, and this is one of the topics that I cover at length in the latest book, in How Charts Lie, it's that we are all victims and we fall for the current way of thinking all the time, including people who have who are well-trained in analysis and visualization, to assume that we understand a visualization just by looking at it, right? We are quickly looking at it. "Oh, I understand this thing," right? Because we tend to approach visualization sometimes as if they were mirror pictures or images. And I usually say, "Yeah, sure, it's an image." But a visualization is more or should be approached more as if it were or as if it is text, something that you need to read in order to understand correctly. And again, to your audience, this may sound a little bit trivial, but it's absolutely not trivial. Being someone who has been working in visualization, again for more than 20 years, I still find myself misinterpreting visualizations because I assume that I understand them at a quick glance. It happened to me recently. I took a look at a graphic assuming that something-- that the graphic was communicating something, and then being asked to go back to the graphic once again and told, "Take a second look at it," and then realizing that I completely misinterpreted the graphic just because I was too quick at trying to interpret that graphic. So never assume that design ornament, that it's just an image that we put in the paper, in the report. Approach visualization, as if we are making or reading an argument. And that argument, in order to be understood, it needs to be paying attention to in order to be understood correctly.

SUSAN: 36:36

Well, I think you've given us a lot of different ways to think about data visualization. It's kind of like the poem-- what's the poem? The Thirty Ways of Looking at a Raven, something like that. I'll fix this later.

ALBERTO: 36:47

Yeah. It's a little bit similar, right? A visualization can be understood in different ways sometimes. Yeah.

SUSAN: 36:53

Yeah, yeah. Which I think is super interesting, and will definitely give our audience a number of new ways of thinking about their work. So thank you so much.

ALBERTO: 37:00

Thank you again. Thank you again for having me.

SUSAN: 37:08

Thanks for listening to our second Data Science Mixer chat with Alberto Cairo. Join us on the Alteryx community for this week's cocktail conversation to share your thoughts. Alberto talked about how data visualizations help us see things we otherwise couldn't see in our data. Trends and patterns that might not be visible when your data are in a tabular form, but that jump off the screen when you make a visualization. Have you experienced a moment when a visualization helped you see something otherwise unseen in your data, maybe with important or useful consequences? Share your thoughts and ideas by leaving a comment directly on the episode page at community.alteryx.com/podcast or post on social media with the hashtag DataScienceMixer and tag Alteryx. Cheers.

 


 

This episode of Data Science Mixer was produced by Susan Currie Sivek (@SusanCS) and Maddie Johannsen (@MaddieJ).
Special thanks to Ian Stonehouse for the theme music track, and @TaraM  for our album artwork.