Data Science Mixer

Tune in for data science and cocktails.
MaddieJ
Alteryx Community Team
Alteryx Community Team

Alberto Cairo, author, researcher and internationally recognized expert on data visualization, joins a special video episode of our Data Science Mixer podcast. We’ll discuss best practices, including communicating complex data, dealing with 2020’s outliers, and preparing ourselves and others for the future of data visualization.

 

 


Panelists

 


Topics

 

 

 


Cocktail Conversation

 

Alberto CC.png

 

What questions do you have about data visualization?

 

Join the conversation by commenting below!

 


Transcript

 

 

Episode Transcription

SUSAN: 00:00

Data visualization is such a fascinating part of the work that we do with data. [music] It's a language in itself. It's simple and complicated at the same time. It presents ethical questions and challenges. And, yes, it can be beautiful. I'm thrilled to bring you today's conversation with data visualization expert and author Alberto Cairo, which we originally shared as a special video session at our virtual Inspire conference hosted by Alteryx. Unfortunately, due to some technical difficulties, we only had a short time to record with Alberto, but he generously offered to join us again soon for a longer session. So watch for that in your feed. In the meantime, we're going to whet your appetite for more data visualization conversation. Let's jump right in.

SUSAN: 00:47

Hello, everyone, and thank you so much for joining us today for this special episode of Data Science Mixer, the podcast from Alteryx, where we talk to experts in lively and informative conversations that will change the way you do data science. I'm Susan Currie Sivek, the data science journalist for the Alteryx Community. I'm so excited to be joined today by Alberto Cairo. Alberto is a journalist and designer, and the Knight Chair in Visual Journalism at the School of Communication of the University of Miami. He's also the director of the visualization program at their Center for Computational Science. He's also directed the creation of information graphics and media organizations in Spain and Brazil. Additionally, he's written three amazing books on data visualization, the latest of which is How Charts Lie. He explains how charts can inform but also can lead us astray. Alberto has so many insights for anyone who creates data visualizations, and wants to communicate both clearly and accurately. Alberto, thank you so much for being here.

ALBERTO: 01:46

Hi, Susan. Thank you so much for having me. It's a pleasure.

SUSAN: 01:49

Awesome. So I thought we would just jump right in with something that you wrote in How Charts Lie. I think often when people are creating data visualizations, they think, "Oh, I should try to make this simple," that being simple means being clear. But you, actually, offer a different perspective. You suggest that charts that have rich and deep messages might actually require some time and effort, and that will pay off if the chart is well-designed. "Many charts can't be simple," you said, "because the stories they tell aren't simple." So how can somebody creating a data visualization know when they've achieved that balance between simplicity and complexity?

ALBERTO: 02:28

Well, that's a topic almost for an entire course, right? I devote a couple of weeks of my classes to sort of discuss examples, showcases, but I can try to-- I can try to summarize. Let me begin by saying that this is not a new idea. So this is something that-- like 99% of what I write. It's an idea that appears in one form or another in the work of many other people who also write, and do, and make data visualization. So I'm thinking about Cole Nussbaumer in her book Storytelling with Data, Stephanie Evergreen, John Schwabish. And prior to these people, Stephen Few who has a wonderful book titled Show Me the Numbers, Edward Tufte a little bit earlier, and even earlier, John Tukey, the famous statistician, or William Cleveland, another statistician who also wrote about data visualization. And we all share sort of like this idea that a visualization is not just an illustration. It's not a picture. It's an image that has a scaffolding, and it has a grammar, and it has a symbol, a set of symbols that are arranged in one way or another. And in order to understand that visualization, you need to learn how that grammar works. You need to understand how that grammar works.

ALBERTO: 03:48

And a visualization should never oversimplify the information that it presents. This is a big problem in the world where I come from, the world of journalism. We journalists, sometimes, tend to oversimplify the stories that we present to people. We just show, for example, a media or an average when we should be showing the entire distribution of the data because the data is very skewed, for example. Right. So in my classes, and also my books, I explain the distinction between simplification on the one hand, which I don't think that is the goal of visualization. The goal of visualization is not to simplify. The goal of a visualization is to clarify, which is completely different, because when people talk about simplification, what we have in mind is usually a reduction, removing detail so the important information rises up. It pops up, so you see it immediately.

ALBERTO: 04:49

My friend Nigel Holmes, who's also a famous-- he's a famous infographics designer, has this idea that, again, saying our goal should not be to simplify. Our goal should be to clarify. So sometimes, in order to clarify, you need to reduce the amount of information that you show, but sometimes, in order to clarify, you need to increase the amount of information that you show, in order to put the information that you're presenting into the right context. Now, how to decide what amount of data, what amount of detail, what amount of information to show? There are really not clear cut rules. Every visualization is different. You need to take into account the nature of the data, the nature of the story that you're trying to tell, the nature of the audience that you are designing the visualization for. There are many, many factors that we need to weigh in order to come up with the right level of detail. Not too much detail , but not too little detail either.

SUSAN: 05:45

Sure. Yeah, that's so interesting. And I love this interesting discussion of clarity and simplicity. Those are interesting concepts to explore here. One of the--

ALBERTO: 05:54

If I may interrupt, I would like to clarify, by the way, that the whole idea of simplicity is widely misunderstood because simplicity is not really simplification. I love to recommend books. And one of the books that I like to recommend to my students, to people whom I know is a book titled The Laws of Simplicity, The Laws of Simplicity, which is a book that was written by a computer scientist/artist, called John Maeda. And in the book, John says that the idea of simplicity is not just reduction. He says simplicity is about subtracting the obvious and the meaningless, but then adding the meaningful. So it's a balance between getting rid of the things that don't really help you tell the story, getting rid of the things that don't make your visualization more understandable or more beautiful, because that's another very important function of visualization, to be attractive. And so reduce those, get rid of those, but then try to think about whether you need to add more to the visualization in order to make the visualization either more attractive or more understandable.

SUSAN: 07:05

Nice. Yeah, that's terrific. And thinking about the beauty and the aesthetics, that's so important, too. One of the stories that I know a lot of people will be wrestling with how to tell is the story of 2020. Right. 2020 is the year that produced 1,000 outliers. The year that, somebody said online, it broke the Y axis. Right. What do we do with all of these different measures that are suddenly so different from their normal and expected values? So what would your suggestions be for people creating data visualizations and having to deal with these really unexpected variations, and trying to provide the necessary context, and tell some sort of story that still makes sense?

ALBERTO: 07:46

Yeah, like when you design a time series line graph, and you have a sudden spike that is so disproportionate that goes through the roof or something like that, and then all the other values are at the bottom. Or you have a scatterplot in which all the dots are on the bottom left corner, and you have a couple of outliers on the upper right corner. Or a bar graph in which you compare, let's say, the population of different countries. And, obviously, China, India, Brazil, as the United States are gigantic, and then all the other countries are super tiny. What do you do in cases like that when the data diverges so much? There are many strategies that we can use for these. And I describe some design strategies that we can apply in some of my books, not necessarily in How Charts Lie, although I also talk about that, how to decide whether the right Y axis and what the right X axis is, but more in my previous book, in The Truthful Art. And one of the strategies that I use all the time, it follows advice that I usually give to people, which is sometimes, a single visualization is not enough. You need to have more than one visualization. So if you have this bar graph of country populations in which you are three or four bars that are gigantic, China, India, etc., and they have all the other bars over here, design that graphic first, but then put a bracket around the smaller countries, and create a second graphic in which you zoom in to those countries, so you can also compare them to each other. In the case of geographic is not interactive, obviously. If the graphic is interactive, just make that area being a button that people can click on, and then they can zoom in to those values that are super tiny in the original graphic.

ALBERTO: 09:23

So there are many strategies that we can follow. In the case of-- in the case of metrics that vary a lot, there are other types of strategies, such as using, for example, non-linear scales when the goal of a graphic is to show not the absolute change from metric, but the rate of change, how often it multiplies. Sometimes we use logarithmic scales, for instance, non-linear scales of different types. But I must say that the problem with non-linear scales is that most people don't understand them really well, and that's a problem. And so I usually tell people, again, this is another strategy of design that I use all the time, which is to tell people a visualization is not just the visualization. The visualization is the visualization and then the explanation that you append to the visualization. So if you use a particular technique that you think, or a particular type of visualization that you think that will look unfamiliar to your readers, a strange type of visualization, or a scale that is non-linear, you better be ready to explain it, to put sort of like a call out box in there saying, "Hey, take a look at these. This is not your traditional linear scale. Be careful. This is the way to read it." And then you explain it to people. So there are several strategies that we can use.

SUSAN: 10:45

Yeah, that's great advice. Great suggestions. And it reminds me, too, of how much of creating data visualizations and working with data is kind of subjective. This is something that you point to a lot in your writing that numbers and data, they seem science, they seem objective. But the ways we choose to represent them, like the decisions that you just described, and the way we choose to interpret them, there's some subjective stuff going on there in putting in that larger context. So there's this question around graphicacy and how people are able to do those things, and to do them well. So you've talked about graphicacy a bit. What does that mean to you? And do you think we are achieving greater graphicacy out there in our society?

ALBERTO: 11:31

There's a lot to unpack in there. Let's begin with the-- let's begin with the concept of objectivity, which is a concept that I am-- among the many obsessions that I have-- I like to read about tons of different stuff from ancient history of the Mediterranean, all the way to statistics, to journalism, to design. One of the things that I like to read about is epistemology, the theory of knowledge, the philosophy of knowledge. And I have developed all these sort of like weird ideas of mine which are non-technical. I'm not a professional philosopher, so I cannot claim any sort of expertise in any of these areas. But what I usually tell people in a very casual language is that I don't think that things are ever fully objective or fully subjective. That's not how things-- that's not how the world works. We cannot see things in black and white. Things, and when I say things, I mean claims, visualizations, whatever message they're trying to convey, they are never 100% objective or 100% subjective. It's a spectrum. It's either more objective or less objective.

ALBERTO: 12:38

So how to make our work a little bit more objective? Well, in the case of-- in the case of a visualization or the analysis that underlies that visualization, the objectivity of a visualization, the level of objectivity, the degree of objectivity of a visualization will depend on how rigorous the methods that we apply to generate the data and then visualize the data are. The more rigorous, the more tested, the more double-checked, the more objective that visualization probably will be. And I wish that this were an idea that were pushed to the general public a little bit more, because the general public, sometimes, has these very strange demands from science that science is 100% objective. That's impossible. We are human beings. We are not robots. We make mistakes. And science is a self-correcting process in which we involve objective elements, and also very subjective elements that interact with each other. With some rigor, we can make our claims a little bit more objective, but never fully objective, because otherwise, they would not be prone to correction. They could not be corrected.

ALBERTO: 13:41

In any case, going to the idea of graphicacy now. So what is--? Graphicacy is essentially a neologism, is a strange word that was invented in the 1950s to refer to graphical literacy, literacy, graphicacy. So it's part of a series of like a bullet point list of types of literacy that people allegedly should have today that was pushed years ago, and was promoted, and popularized by other authors. For example, Mark Monmonier, who is a cartographer, he has this wonderful book titled Mapping It Out. And in Mapping It Out, Monmonier says that today, in order to be an educated citizen of a democratic republic, people should be educated not just in literacy, the ability to read and write. That's obviously, the foundation, the core. That's the first thing that you need to learn. But there is much more than that. You also need numeracy, which is the ability to sort of like reason scientifically with numbers. So it involves statistics, mathematics and so on, and so forth. It's not reduced to those. It's not reduced to statistics. It's more sort of like these developing of a sixth sense related to numbers. That's the way that I usually explain numeracy to people. It is sort of like the ability to sort of like when you see a member or a graphic in a newspaper, a magazine, there is sort of like a sixth sense in the back of your mind that says, "Let me look at this with a little bit more of attention. There's something strange here." That's numeracy at work. You don't know exactly what is wrong with it. There may be nothing wrong with it, but at least you have sort of like that sort of like alarm in the back of your brain.

ALBERTO: 15:25

So we have literacy. We have numeracy. We have articulacy, which is the ability to express yourself through spoken language. And then we have graphicacy, which is the ability to understand, either to read or/and to produce visuals, to either understand data, explore data, or to communicate data. Because visualizations are not just means for communication. A visualization can also be a tool for reasoning, for better thinking. It is only that you can use it only effectively if you have a certain degree of visual literacy, of graphicacy. The last part of your question, do I think that we should push graphic graphicacy as a society? Absolutely yes. I wish that reading visuals was taken more seriously in educational systems, teaching people how to produce-- how to read, first, graphics of different kinds, not just data visualizations, but data visualizations, maps, diagrams, explanation graphics, and so on, and so forth, illustration-driven graphics, how to read those things. And as an extension of that, how to produce them, how to use those types of graphics to communicate with others, but also to understand data ourselves better.

ALBERTO: 16:43

And the reason why I say this, sorry to be so long with the answer, that I feel very strongly about this, is that through my own experience, I have seen the value of graphics, not only to communicate with others, but also to understand information yourself. Whenever I read a book, seriously, if I want to remember what the book is about, I draw a diagram of the content of the book, sort of like a network diagram in which I put all the things that the book is about, and then I connect the concepts to each other. And that diagram functions as a mnemonic device. Years later, I may not remember what the book is about, but I can go to that visualization, hand-drawn visualization, very rough visualization, take a look at it, and sort of like remember what the book is about, what the main concepts were in the book. I wish that this type of knowledge, were more widespread in society.

SUSAN: 17:34

Oh, it's great. And I can imagine everybody now starting their own mind maps, and diagrams, and all the things that--

ALBERTO: 17:39

Yeah, mind maps, if you want to call them. Yeah. That would be the right term, I guess.

SUSAN: 17:44

So I know we have limited time. But I wanted to ask you, I think the phrase that you just used was data visualizations as a tool for reasoning, not just for communication. So with that kind of concept in mind, are there examples or stories you can think of from your experiences where you've seen data visualization used really well and having a serious impact on a business or an organization?

ALBERTO: 18:09

Many, and I cannot speak of specific organizations because I have nondisclosure agreements with clients, but I work with some governmental agencies, let's put it that way, that use visualization on a regular basis to analyze their data, and reason about their data, but then also to communicate the results of their analysis to people who are not necessarily well-trained in statistics. Another client of mine, which is a company that is a warehousing company that sells products, construction products, let's put it that way. They use infographics to communicate internally. So they don't use Excel spreadsheets. They actually design, not just data visualizations, but full-blown infographics with pictograms and icons, to communicate ideas internally in the company. And that is amazing. That was incredible when I discovered that they did that. And so, yeah, I see many, many examples. I see visualization growing, which makes me really happy. I said before I think that visualization is a language that anybody can learn, anybody can take advantage of, and it makes us smarter if we know how to use it really well.

SUSAN: 19:25

[music] Absolutely. Well, I think that's a great point for us to end on. Alberto, thank you so much for sharing your insights into data visualization and some really concrete tips that I think our audience will be able to take to their everyday work as they're creating and consuming data visualization. So thanks so much for being on Data Science Mixer.

ALBERTO: 19:42

Thank you so much for having me, again. That was a pleasure, Susan. Thanks.

SUSAN: 19:48

Thanks for listening to our Data Science Mixer chat with Alberto Cairo. Be sure to keep an eye out for our next longer conversation with Alberto coming soon to your feed. Since we have another chance to talk to Alberto soon, what questions do you have about data visualization? Let's dedicate this week's cocktail conversation on the Alteryx Community to your questions and thoughts on the art and science of data vis. Drop your questions in the comments, and maybe you'll hear an answer from Alberto in our next interview. Share your thoughts and ideas by leaving a comment directly on the episode page at community.alteryx.com/podcast, or post on social media with the hashtag Data Science Mixer and tag Alteryx. Cheers.

 

 


 

This episode of Data Science Mixer was produced by Susan Currie Sivek (@SusanCS) and Maddie Johannsen (@MaddieJ).
Special thanks to Ian Stonehouse for the theme music track, and @TaraM  for our album artwork.