Data Science Mixer

Tune in for data science and cocktails.
MaddieJ
Alteryx Community Team
Alteryx Community Team

Margot Gerritsen, Stanford professor and a founder of Women in Data Science, joins us for our debut episode.

 

 


Panelists

 


Topics

 

 

Margot CC.png

 

Join this episode's Cocktail Conversation by commenting below!

 


Transcript

Spoiler

SUSAN: 00:01

[music] Welcome to Data Science Mixer, a podcast featuring top experts in lively and useful conversations that will change the way you do data science. We've got top-shelf insights and happy hour fun on the menu today. So grab your favorite beverage and a snack and get ready to learn and enjoy. I'm Susan Currie Sivek, the data science journalist for the Alteryx Community, and I sat down with Margot Gerritsen.

MARGOT: 00:25

I'm a professor at Stanford University in Energy Resources Engineering, and I'm also affiliated with the Institute for Computational and Mathematical Engineering. And as part of my work, I'm the co-founder and co-director of the Women in Data Science Initiative.

SUSAN: 00:43

Wonderful. Thank you. And do you mind sharing with us which pronouns you prefer?

MARGOT: 00:47

Oh, she/her.

SUSAN: 00:48

Okay. Perfect. And for all the fans of the terrific Women in Data Science podcast, you'll recognize the legendary Margot as its host. We talked about her passion for making data science a more inclusive field and her colorful career of fascinating data projects. I see references here to sailboats and sustainability, and there's petroleum technology, and there's even a pterosaur in the mix. [laughter] At the end of the episode, we'll have a special Cocktail Conversation starter, where we invite you to share your thoughts and learn from others on the Alteryx Community or on social media, using the #TopShelfData. So be sure to stick around for that. Plus, at the end of our conversation, you'll find out Margot's alternative hypothesis, the myth about data science that she'd like to challenge. And now enjoy our chat with Margot Gerritsen. Well, Margot, I have to say, I am a little bit starstruck to talk with you because your Women in Data Science podcast has totally been a guide to me and so valuable to me during my own journey into data science over the last few years. So thank you, first of all, for doing such amazing work there.

MARGOT: 02:04

Oh, that is so unbelievably nice to hear, Susan. I'm glad.

SUSAN: 02:09

Yeah. It's really wonderful. It's been great to hear those stories and really inspirational. And as you know, one of our themes here is that we like to have a little drink or a happy hour snack or something as we're chatting. So I think you may have something there with you?

MARGOT: 02:26

I've got a lovely cup of Peet's Coffee.

SUSAN: 02:29

Excellent. Excellent.

MARGOT: 02:30

Yeah. I have to say, that's my favorite coffee in the world. And I've been a Peetnik, as they say, since 1990. [laughter]

SUSAN: 02:38

Wow. Excellent. That's quite an endorsement.

MARGOT: 02:41

Yes, yes, yes. Many, many, many pounds of coffee beans have somehow helped me in my career and also in WiDS. [laughter]

SUSAN: 02:52

Definitely. Definitely. Valuable fuel for all of us. I had my coffee already, so I'm now on my cup of black tea. That's usually my second cup of the day. So I'm having a celebration tea, it's called, with apricot and spices, quite delicious.

MARGOT: 03:05

Wow. That sounds excellent.

SUSAN: 03:08

Yeah. Fun stuff. Well, Margot, I would love to kind of start at the beginning with you and hear a little bit about how you got into data science. We get to hear through your podcast so many stories of other women's journeys into the field, but I would love to hear how you ended up in data science and how your background shapes the work that you do today.

MARGOT: 03:28

Yeah. Sure. I'll be delighted. So I'm, by training, a what we would call computational mathematician. So I started training in that area in the '80s. So that shows you how old I am now. Been in this field way too long. But I did this sort of computational mathematics, applied mathematics as undergrad. And I chose math at the time because I was so interested in many different applications. I was thinking, "Oh, maybe I want to understand fluid dynamics. Maybe I want to do something in geophysics. Maybe I'd like to do some mechanical engineering." For a long time, I wanted to be an ornithologist to really understand bird flight. And I couldn't make up my mind when I was 18 and was going off to college. And so I thought, "I'll study applied math because that gives me a skill set that I can apply to many different fields and maybe that will give me the versatility to jump around a little bit in my career."

MARGOT: 04:33

And then that was back in Holland, where I was born and raised. And I've really enjoyed working with computer simulation codes, building the algorithms, of course, generating a lot of data. Some of these codes you build, if you'd like, a virtual laboratory. So some of these codes you build a virtual laboratory for, say, fluid flow process, and then you start running it. And of course, you get I don't know how many bytes of data that way, right, observational data, that then later you would have to explore. So that's now what we call data mining. And so I really started in data science in the data generation part through my own simulation codes. The biggest example of that was a little while later, when we built a code with a group, and I was co-PI on that, to simulate the flow in Monterey Bay, and I don't know how many terabytes of data we generated. And we certainly haven't mined all of it. In fact, a lot of that sort of data is still in storage. One of the exciting things about data science is that now we have the tools to go back and actually explore those data. So if you look at NASA, Boeing, National Laboratories, and also academia, so many simulation codes have been written and so much data has been generated that is really just sitting on a bookshelf, maybe on a big tape or on a disk or somewhere stored on tape and that is just waiting to be explored, which is kind of exciting. It's like a treasure trove.

MARGOT: 06:19

So that's how I started. And then at some point in my career, I ended up at Stanford and studied some more in fluid mechanics. And after spending 5 years in New Zealand, I came back to Stanford. And then about 11 years ago, I became the director of the Institute for Computational and Mathematical Engineering at Stanford then. And there, in 2010, when I started, I really felt, "We need a special master's degree program in data science." And we had data science available to students through computer science, through statistics, but we in computational mathematics are sitting at this nice little interface where you do use computer science and you certainly use statistics, but you also use these computational mathematical tools. For example, linear algebra or some topological data analysis, graph theory. And there is so much wonderful cross-fertilization going on, computational mathematics, and then taking bits from computer science and, of course, taking the foundations from statistics. And then you can really build some of these practical applications. So we thought we had our own flavor of data science. And we built this master's program, and that's when I really got excited also on the educational side.

SUSAN: 07:41

Yeah. Absolutely. Yeah. That's a great opportunity to take advantage of so much expertise in so many different areas there at Stanford.

MARGOT: 07:48

Yeah. It's really a nice little-- well, sometimes they call it a pea soup of people coming in with all kinds of different backgrounds. And you put it all in that really big pot, and then you stir it, and you cook it for a while. And then somehow what comes out is pretty delicious. So, yeah, no, it's been wonderful. And then at Stanford and in the Bay Area and lots, of course, peer institutes and colleagues that I work with through SIAM, for example, our professional organization, Society for Industrial and Applied Mathematics, you get in touch with so many people in the very broad field of data science. And it's been wonderful to learn from all these colleagues. And then in 2015, we decided to start within this institute an activity which we called Women in Data Science just because we wanted to promote women in this field that really weren't promoted all that readily. And so we started this conference, and we hit a nerve, and it's grown into this global initiative, which has just been amazing. So gradually, that took more of my time. And now I'm no longer director. We have a wonderful successor. And he's keeping the institute running, and I'm keeping WiDS running.

SUSAN: 09:05

Wonderful. What a great opportunity to create an organization to make women's lives in data science more productive and interesting and useful. That's awesome. And I want to come back to Women in Data Science here in just a bit. I'd love to go back to your own career for just a moment because you were talking about bringing in all of these different fields into data science and exploring these intersections among these different fields. But I find your own history of professional work super interesting because you've also used data science in a bunch of different areas. I see references here to sailboats and sustainability, and there's petroleum technology, and there's even a pterosaur in the mix.

MARGOT: 09:42

[laughter] Yeah. That was a small data project. Small data. [laughter] But you remember that I said that I studied math because I was hoping it would make me a little agile?

SUSAN: 09:55

Yeah.

MARGOT: 09:57

And then I decided that I would actually walk the talk. [laughter] And so every so many years, I get this itch where I think, "I'd really like to learn about something new." And sometimes I think my colleagues despair a little bit when I get this because then I go off on some tangent. But I've been so lucky. So when I was a PhD student, I started looking more at fluid mechanics. I've always been fascinated by fluid flow. I mean, how amazing is it just observing it in the sky or in the ocean? And that always really fascinated me. And then I went to New Zealand for five years, and I don't know how many of you know New Zealand, but it is a fantastically beautiful island nation that has an unbelievable coastline. And no surprise, a lot of research that's done in New Zealand is focused around coastal ocean waters. And so I started doing some coastal ocean modeling, which was really fun, and learning about that. And then I started doing some sailing because I had two students-- or actually, three students at some point who were working on Team New Zealand related projects for the America's Cup. And so I got really keen on that too. And one of my former students is now still a designer with Team New Zealand. So that was fun.

MARGOT: 11:22

And then I went back to Stanford and started Stanford Yacht Research. And right now, that's just me and zero dollars. [laughter] So we're really not doing anything. But I was hired back in a department that at the time was called Petroleum Engineering. Now we're Energy Resources Engineering. And so I started doing fluid simulation in reservoirs subsurface. Not just for potential oil and gas, but these things also, of course, apply to aquifers and in things like carbon sequestration and so on, and I spent a lot of time working on that and trying to mitigate harmful environmental impacts of oil and gas development. Yes. And then because of my sail work that I was still sort of carrying on, one day, National Geographic called me and--

SUSAN: 12:18

Oh my.

MARGOT: 12:18

Yeah. That was totally--

SUSAN: 12:20

That's exciting.

MARGOT: 12:21

That was very exciting, and it was because I put a website up for Stanford Yacht Research. And at that time, it was me and two students and maybe $10,000. That was about all we had. And they were looking for somebody who could help with a pterosaur replica project that-- and the goal was to build a glider, sort of a flying replica, at scale of a large pterodactyl. And they wanted in this documentary on pterosaurs-- they called this documentary Sky Monsters, and it's available on Amazon. And they wanted to follow this design team throughout this documentary to show the development and the design. And then, of course, they were hoping for some crashes and some drama, and we gave it to them. [laughter] And they said, "Well, you do sail research, and pterosaur wings look a bit like sails. Would you be interested in being an adviser?" And I, of course, said yes because that sounded really fun.

SUSAN: 13:25

Wow. Yeah. [I believe you?].

MARGOT: 13:27

Yeah. But then I ended up playing a bigger role in that project than I'd originally anticipated, and I ended up sort of taking over as PI, principal investigator, on that, of course, with the help of lots of my colleagues because I did not know very much about aerodynamics. But it was this other thing of, "Hey, let's dive into something completely new and crazy and build a remote-controlled replica." So that was by far, by far, the craziest project I've ever been part of.

SUSAN: 13:57

I love it. I think that's amazing. So we're talking about a real-life replica. How large was this ultimate replica that was built?

MARGOT: 14:06

It was about close to, let's say, two to two and a half meters wingspan, and the actual pterosaur, we estimate between three- and five-meter wingspan. So we built it at scale.

SUSAN: 14:19

Wow. That's--

MARGOT: 14:20

And it was partly a success. So I was really quite proud of the team for what they did in [gliding phase?]. And so you can see that in the documentary. And then some drama happened, and Herkie--

SUSAN: 14:34

Reality TV. [crosstalk].

MARGOT: 14:35

Yeah. Reality TV. Herkie, as my son called it, the model, crashed. And he was four or five at the time, so he named it. And--

SUSAN: 14:43

Oh. He must have loved that project.

MARGOT: 14:45

Yes. He was part of it. I was a sort of part-time single mom at the time, and so I took my son everywhere, and he loved it. And it was mostly a weekend and evening project because I was working on my tenure in something totally different, so. But anyway. Yeah, so it's fun.

SUSAN: 15:07

And I think it's so interesting because I can imagine some of our data scientists out there listening to this and thinking about, "Wow. Being able to move from field to field, that's such an incredible aspect of this profession, being able to apply your knowledge in so many different areas." But I do feel like it takes a certain kind of confidence and curiosity to do that. How do you muster that within yourself to explore these different areas and feel confident doing that? [music] Okay, everyone. I know we all want to hear Margot's secret to confidence and success in exploring different fields with data science, and I promise we'll hear her answer in just a moment, but first, let's take a quick break.

S3: 15:50

Hey, everyone. This is Tyler Heinl. I'm a product manager working on the open-source software at Alteryx. Today I'd like to highlight EvalML, which is an auto modeling library built in Python. EvalML is a one-stop-shop for supervised learning problems. Contains everything you need to build a supervised machine learning pipeline. So we're talking preprocessing steps, objective functions, and custom objective functions, plenty of supervised learning models, and an auto modeling tuner. Finally, once you get that auto modeling pipeline, it contains a model understanding function so you can really understand how that model performed and what you might need to tweak. In just a few lines of code, you can get a pipeline that is tuned against an objective function of your choice via Bayesian hyperparameter optimization. We have this up on our GitHub page, which is github.com/alteryx. Additionally, you can check out our documentation and some tutorials on evalml.alteryx.com. Stay tuned for additional demos and updates. You can get the latest news by following us on Twitter @AlteryxOSS.

SUSAN: 16:55

Thanks for telling us all about EvalML, Tyler. And now let's get back to Margot Gerritsen, who was just about to tell us how she finds the confidence to venture into whole new areas with her data science expertise.

MARGOT: 17:06

Yeah. Well, I don't feel very confident doing it. So it's not that I have confidence. At some point, you do it often enough that you think, "Okay. I'm going to panic for a while now, but I've survived many times before, so it's probably going to be okay." But think of it as diving into cold and deep water, like you're in a mountain lake in the summer. And you know that water is going to be very cold, and yet you dive in, and then you sort of sink or swim, and it's going to be okay. So typically, I go in because I really like learning, and so that drives me. And then usually, I have a couple of months of sheer panic where I think, "What am I doing?" [Again?], a total imposter and feels very uncomfortable. But then I remind myself, "But I'm learning so much." You're on this learning curve, and it's super steep, but how fun it is to be learning for work? And then after a few months, you start to understand things a little bit better.

MARGOT: 18:15

And I'm blatantly honest most of the time. When I was younger, it was a little bit harder to be so honest about my lack of knowledge. But I try to say now, "Hey, teach me," or get a group of courageous students around me who are willing to learn with me. I just recently dove into a new project on transportation modeling, where we're interested in fully commissioning of internal combustion engine vehicles, and that is new for me. And I have four students in this group, and they know this is new for me, it's new for them, and so we're exploring this together, and it's super fun. Also, hard for the students, but I think it's really, really good to sort of learn to be comfortable with the uncomfortable because that's what research really is about. And if you're comfortable most of the time, I don't think you're really learning all that much.

SUSAN: 19:11

That's a great point. And what a wonderful thing to be modeling for your students too, that it's okay to feel like you don't know something. It's okay to enter a new area and to be the learner, so [that's terrific?].

MARGOT: 19:22

Yeah. A panic-stricken adviser is probably quite the experience for my students. [laughter]

SUSAN: 19:31

But you get through it. You work through it.

MARGOT: 19:33

We have fun doing it. I think with a lot of that is it becomes a little easier when you don't take yourself too seriously. Now, of course, it's easy for me to say now at this stage of my career because I can take a risk. For my students, of course, they're much braver than I at this stage. I do see in a lot of students and a lot of people around us such a hesitation to jump into something new. There is just a lot of fear for not performing. I see this with a lot of the students too. They also have a problem with ambiguity. I think that's probably because in high school and middle school, we sort of beat it out of them. They get this feeling that every question has one right answer. And if you don't get exactly that right answer, then you're not good enough. And of course, that's nonsense, right? Most questions are not even well-defined. And most of them have multiple answers, depending on your point of view. And I wish that they were a little bit better prepared for that because they have to unlearn a lot of things when they get to research stage. You always end up with more questions, right, than you end up with answers, and that can also be very disconcerting.

SUSAN: 20:43

Yeah. Yeah. Absolutely. And I think one of the other things that's exciting about exploring those ambiguous and unclear areas-- and something else that I wanted to address with you. I wonder, as you've explored these different areas and different disciplines, if you've found interesting areas of overlap that maybe you expected or didn't expect. Are there ways that the ocean studies informed the pterosaur project and informed the reservoir project? I'm sure there are some interesting ways that those ended up connecting that maybe weren't things you anticipated early on.

MARGOT: 21:16

Oh, absolutely. When I took a class in linear algebra - for those of you who have done that, you know it's about computing things with matrices and vectors, and this is a very core part of some areas of data science, for sure, recommendation systems, page ranking, searching; so much is based on matrix algebra - my instructor at the time and mentor in this field said, "Deep down, when you dig deep enough, almost every engineering problem is really built on linear algebra." And I did not really accept that straight off. I was young. I wasn't so experienced. And that is something I've certainly found, that there are these building blocks in mathematics and statistics, these foundational concepts that you find in every field. And so if you look at the intersection of reservoir modeling, fluid flow modeling, sail design, pterosaur design, and other things that-- search engines that I've helped build or recommender systems, what they have at the core is linear algebra. And so if you can do this translation of the problem that you're looking at into this language of matrix computation, then they're all similar. And the challenges are all similar. That's the interesting thing. So most of these problems-- and now I'm using some jargon, but most of these problems with very large matrices, they're ill-conditioned, and so it's tricky to work with them. And so that was one thing that I certainly found, and that's one thing I'd like to teach my students.

MARGOT: 23:11

The other area where all these problems, really, probably across STEM and many, many fields really intersect is in the design of the solution method because ultimately, when you start solving something, the first stage is always the problem definition. You'll never have a clear-cut problem. And if you do, if it is really well defined, then maybe it's no longer that interesting, right? So you spend a lot of time sort of thinking, "What am I actually after, and what is the goal of this? How will I design my solution approach?" You need to understand your boundary conditions. You need to understand your initial conditions. You need to understand the players, the stakeholders, the different points of view. Are you trying to extrapolate out? Are you trying to predict? Are you trying to optimize? All these things are very, very similar. And then across many of these physical things that I've worked on - reservoir engineering, sail flow, airplane design, wind turbine placement optimization, or other things that I've done - the underlying physics is all very similar. And the mathematical equations that you use to represent that physics are very similar. So they're all sort of systems of partial differential equations. And again, they're similar in nature.

MARGOT: 24:40

You may have multiple time skills and spatial skills. You may have strong nonlinearity in your system. And so, yeah, it's maybe, in hindsight, not so surprising. But what is surprising is that there is actually way too little synthesis between fields and too little cross-fertilization. And you see this even now in data science, which in its current form is kind of a new science, but it's surprising how there are different niches built, and so there is a particular conference where a certain approach is always used. Then there are other organizations or conferences where people look at it from a math point of view or from a statistics point of view or from an applications point of view or from a computer science point of view. And some people formulate things using graphs. Other people formulate things using matrices. And there is absolute connection between them, and there could be such cross-fertilization if people talked a bit more. But I think it's very common for any field that is broad to subdivide into little niches and little bubbles. And so I like being somebody who tries to pull these different people together and have them learn from each other.

SUSAN: 26:02

Yeah. Absolutely. And is that kind of-- is that what you see as the way to address those silos, by having those conversations, providing a forum for those conversations?

MARGOT: 26:13

Yeah. For sure. I think that what can happen in any research environment is that you become sort of an echo chamber. You hire people with similar sort of background, similar sort of foundation, similar sort of outlook on things or approach to problem-solving. And before you know it, you're really in this bubble, and you have to break through. And sometimes, in fact, I think that's happening with companies in Silicon Valley and also in certain academic centers, absolutely, right? You see this siloing. And you don't even always know until maybe you mess up something a little bit, right? So you do something that turns out to be poorly designed or biased, and people start complaining about it. It happens sometimes with companies. And then you realize, "Oh--" if you're a really good manager, and you start thinking, "How did this come about? How did we get to this end product that really was flawed?" Usually, I think the end result or the conclusion you draw is, "Hey. We didn't really have enough diversity in the people that were working on this problem, and we weren't challenging our own design decisions enough. There was really an echo chamber." And at some point, you don't see it anymore until you get confronted with the harsh reality; let's put it that way. So sometimes it's through pain that you say, "Okay. Now I've got to work with some other people."

MARGOT: 27:48

But you could try to do this by exposing people to other trains of thought, by bringing people together in workshops. And there are some wonderful examples of this. And you don't get everybody. I mean, if you follow some of the influencers, if you'd like, in data science on Twitter, you see that. There's a lot of-- too much animosity sometimes where there are different camps claiming that their approach is superior to others. Others don't know what they're doing. This is why professional organizations that are very diverse play such a big role and why conferences that are broader rather than narrower can be a real eye-opener and why something like arXiv is really good, where people can put research papers, and there is more of a free dissemination of research work. Unfortunately, with some of the big conferences in, say, deep learning or AI, you see a real narrowing of the representation because it's the same sort of groups and the same sort of universities that get to present their stuff. And that's because there is a thirst and a real hunger for a certain type of AI right now or deep learning that really can only be done by certain groups because they require really large computer power and a lot of data that is really not democratized because not everybody has the hardware nor the data repositories to really work with, so. There's a little bit of banging going on right now where--

SUSAN: 29:33

Yeah. That's okay. Not a problem.

MARGOT: 29:36

The clashes of working from home.

SUSAN: 29:38

Yes. Exactly. Exactly. I'm amazed my dog has been this quiet so far. So hopefully, I didn't just jinx that. [laughter] So far, so good. Definitely. I would love to go back to something you mentioned just a moment ago, which is arXiv. And I would love to hear a little bit about maybe how you use that as a tool to stay current and to learn. Do you have any tips for folks who would like to use that as a way to keep their data science practice current and maybe diversify it a bit?

MARGOT: 30:05

Yeah. I like the practice of sort of serendipity, I call it. [laughter] So when I was a student, so this is a long time ago, there was nothing really online, right? Everything was just in journals. And the only way to understand what was going on, apart from conferences, is that whenever a new journal volume would come out, you would go to the library, and you would sit down, and you'd read them, right? And so I had this practice of every week setting aside three, four hours to just be in the library and do that. And that was really great because the journals are often quite broad in what kinds of papers they accept. You got exposed to different groups. And then sometimes I would find a group that I thought had a really interesting paper, and I would start looking at their previous publications and sort of do a little autopsy, if you'd like, of the group. What have they published? And I do this with my students too. So we say, "Hey, that's an interesting group. They're claiming this. Now, let's see what they've done in the past, what sort of tangents they went on, what they said 10 years ago would work and what they've really continued to work on or not," because most papers say this is the best thing since sliced bread, but then sometimes you see in the history that they're not following up. So obviously, it wasn't that great of a bread at all. So I did this.

MARGOT: 31:29

And now with arXiv, that's so much easier, right? So I highly, highly recommend to people just spend a couple of hours every week to dive in arXiv and just start browsing now and follow some authors. Follow some groups. See what they've done, which directions they've been, because unfortunately, most papers are not super honest about shortcomings of an approach. They use to say, "Look at the amazing work we've done, and here is selected proof." And very seldom do you get papers to say, "Hey. The algorithm we designed works pretty well on this, but actually, never use it on this sort of data or keep in mind that it took us three months to tweak the parameters in this CNN so that we actually get something that looks really reasonable, but we're not going to tell you about this."

SUSAN: 32:26

Yeah. Let's not talk about that part.

MARGOT: 32:26

Yeah. That's right. And so then you have to do a little bit of investigating. Say, "Okay. What have these people done in the past? Where have they gone?" And then you can learn a lot from that, and that's super fun. And then the other thing then, of course, is just connect with folks. That's one thing I also see such hesitation in students to send an email to an author and say, "Hey, I read your paper. I really like it. Can we climb on the phone, have a little conversation? I have some questions." And that seldom happens. So I occasionally get somebody saying, "Hey, I found this paper, and I have some questions about it." And of course, I love that. And so most people are very open to discussion. And now with arXiv, it's so much easier to discover. But I think most students just don't take that time. They always feel that they have to spend a few more hours on their code or a few more hours on their homework assignment. Yeah.

SUSAN: 33:27

And researchers are scary, right? I mean, oh, very scary people. [laughter]

MARGOT: 33:31

Yeah. I have to remind myself of that sometimes, that I'm probably also a little scary to people. But if anybody here listening has any questions about anything I've done, send me an email, and I will answer.

SUSAN: 33:47

Oh, thank you.

MARGOT: 33:47

I absolutely will. Or connect on LinkedIn. I love hearing from others. Also, if you don't agree with something I say, let's discuss.

SUSAN: 33:56

Yeah. Yeah. That's awesome. Well, thank you. Thank you for that offer. That's very generous of your time. Appreciate that. Yeah. Thank you for talking about arXiv a little bit more. I think those are great tips and recommendations for folks to expand their own horizons and practice a little bit. I'd like to come back to Women in Data Science. Because as you mentioned, getting people together, sharing across disciplines, that certainly seems to be part of the WiDS mission as well. I'm curious, from your perspective through WiDS, through the organization, through the podcasts where you've interviewed so many amazing women, are there kind of recurring elements of women's experience in data science that have stood out to you as maybe unique to their experience or trends that you've observed as you've had those conversations over time?

MARGOT: 34:43

Absolutely. So with data science, we're in the same sort of state for women as we have been in the broader field of computational mathematics or scientific computing for decades now. When I was a student, there was maybe 10 to 15 percent women in that broader field. And with data science, it's probably about the same. It's actually a little bit hard to get the right numbers. And so all women that I've talked to and have heard share the experience of being by far the minority, right? So being the odd one out. Very, very common experiences to be the only one in the team, to be the first one in the division, or to be one of the very few. And I think that until you get to about 30% representation, you're going to stay different. And so that comes with challenges. It also comes with-- I've always felt it was quite balanced. It comes with some opportunities, and it's absolutely not all bad. But for those of you who are listening who have never been the underrepresented gender, it can be very strange. And it can make you feel a little isolated. And it's very common to not be included as much, to just always being seen as a little different.

MARGOT: 36:12

Now, on the bad end of the spectrum, sort of the difficult end in the spectrum, you do find some misogyny. You do find the occasional mansplaining. You do find glass ceilings that seem to be hard to break through. And most of the time, I think women deal with the sense of-- that they're invisible. And it's funny because when you're different, you stand out. But at the same time, you're not really part of the club. And so sometimes you feel a little invisible. So those are very common experiences. On the positive end, you see a lot of these women incredibly excited about the potential of what they do. This is data-driven decision-making. That happens everywhere, right? And so it penetrates all industries and all research areas. And so it's important in business. It's important for NGOs. It's important in healthcare. And so all of the women that I've met and talked to are incredibly excited about the potential of this field and what they can do with this, what they can contribute to it, and also, very excited about the possibility to really create wealth. Not necessarily for themselves, right, but this is-- data is the new oil. It's the new gold. It's the new bacon, whatever you want to call it. And so there's--

SUSAN: 37:42

Did you just say bacon?

MARGOT: 37:43

Yeah. They say this is the new bacon.

SUSAN: 37:46

I haven't heard that one yet. That's great.

MARGOT: 37:48

[laughter] So it's the new what have you. And so there's an enormous wealth creation going on in this field, and this is also really what drives me to promote women because if you have this unbelievable wealth creation, and with wealth comes power and influence, and that is owned, for the most part, by one gender and for the most part, by one or two ethnicities or races, you're in a whole lot of trouble, I think. I think that's very unfair, whereas data science really does have intrinsically the ability to really globalize and-- be globalized and democratized because you're not geographically constrained like you were in the past, with previous wealth creation of mining or oil and gas, which came with enormous wealth and enormous power.

MARGOT: 38:51

Of course, with data, you need access, and of course, you need data. You need computing power, but you don't need to sit on top of a mine. And so the infrastructure needs are much less. And you need knowledge, of course, but there are amazing educational systems around the world, really. And so I got really excited about this at first in thinking, "Wow. This is a real chance for us to see some true sharing of this new wealth across cultural backgrounds, across cultures, across genders." But it's not really happening. And this is one of the reasons why I started with WiDS. And that sort of frustration you see also in the women. And also, at the same time, there's hope for this field that it really could empower people across the globe. And we've seen that at a small scale, right? We've seen wonderful work happening in Africa and Asia. And with WiDS, we are in over 70 countries across all continents but Antarctica, but we're working on Antarctica. [laughter] We want to be there too.

SUSAN: 40:07

Got to get somebody down there.

MARGOT: 40:08

Yeah. That's right. There is a research station. Ross research station on the Ross Ice Shelf. And we did connect [crosstalk]--

SUSAN: 40:15

They should have data.

MARGOT: 40:16

They absolutely have data. The problem is they don't have great bandwidth--

SUSAN: 40:21

Yes. That makes sense.

MARGOT: 40:22

--so it's a little bit more difficult to get them involved.

SUSAN: 40:26

That's funny.

MARGOT: 40:26

But hey, if you're listening from Ross station in Antarctica, please connect with us. We'd like to have you [part of us?].

SUSAN: 40:31

That would be awesome. I would love to know that too. That's so cool.

MARGOT: 40:35

But anyway. So you see this hope for this field and this desire in the women also to really help with this, to empower young women, but also women that are already in the field and practicing, to support them and to promote them and to show that they're doing great work and, in that way, really lowering the barrier to entry because having role models makes a big difference, seeing, "Hey, I'm not the only one." And I'll tell you one, maybe two quick WiDS stories. There was this girl from a small village in India who got to hear about WiDS, and she was following us on her telephone. And she ended up being so inspired, saying, "I can do this. I'm good at math. I can work on this." And she ended up ultimately in the United States, studying there and then joining a company, which was amazing. And when we started in Bolivia, I remember some of our Bolivian WiDS ambassadors saying, "Hey. We want to organize a WiDS meeting in Bolivia." And I think we have four female data scientists in La Paz, and I think I know them all. And then realizing that there were 60 to 100 women that actually showed up. So a wonderful way to--

SUSAN: 41:56

That's awesome.

MARGOT: 41:56

--network and get to know others and share the beauty of this fantastic field.

SUSAN: 42:04

Absolutely. Well, and Margot, you've done such amazing work in bringing together these women across the world into the field. You're truly a role model for the things that you've done and your efforts in this. So thank you for that. I have a couple of last things to ask you. Well, okay, I'm going to be really honest, but we have a recurring segment that we're going to launch for the show. So you're actually the first person who gets the recurring segment. So it's not technically recurring yet.

MARGOT: 42:30

I love it.

SUSAN: 42:31

It will be eventually [laughter]. So it's called the Alternative Hypothesis. And we're trying to ask the same question to our guests to kind of debunk some of the myths and ideas that are out there about data science. So I'm curious, is there something that you think people often believe to be true about data science or about being a data scientist, but that you personally think is actually incorrect?

MARGOT: 43:00

Absolutely. There were two myths that have been, both of them, the bane of women in data science. Here are the two. One is that to be a successful data scientists or a computational scientist or anybody working in this intersection of statistics, mathematics, and computing, you must have really strong, innate ability. That's one myth. So what I mean by that is that many people believe that if you don't have a very high level of natural talent, you're never going to make it. Now, I'm not somebody to dispute that having a high level of natural talent helps. Of course, it helps with athletics. It helps with arts. It helps with everything, right, including data science. But there is this myth that without a high level of this, you cannot be successful. And that's just wrong because it ignores this growth aspect of people, right? And so I never considered myself to be somebody with genius-level math at all, but I am somebody who likes to learn and is a little stubborn and doesn't give up so easily and really wants to understand something. So I have this growth mindset, which I think is essential, not so much this innate ability. That's one myth.

MARGOT: 44:23

The second myth is really, really unfortunate for women. And that is the myth that this sort of innate ability that people believe you must have to be successful is more common in men than in women. And this is still after decades of debunking and showing both in science and studying it as well as just from the data that this is not true. Still, many people believe in this. And many women believe in this. So if you believe that innate ability is really, really important and you believe there is a threat and you don't have what it takes probably also because you're a woman, well, that combination is really lethal. And so yes, both of them have been debunked, but they're still causing a lot of young women and also older women to give up or not enter the field because they don't feel they have what it takes. And if you then enter into a culture that is a little different and you feel a bit left out, it's easy to conclude then if you're not part of this community because you're different that you're not part because you, indeed, don't have what it takes. [music] I just want to say, Susan, thanks so much for having me on. It was a real pleasure to talk to you, and good luck with your new podcast here. This is brilliant. And I know I will be listening to every episode that comes out.

SUSAN: 46:16

Oh, well, thank you so much, Margot. I'm extremely honored and flattered to hear you say that and to have you on the show. Thanks for listening to this Data Science Mixer chat with Margot Gerritsen. Let's continue the conversation. With every episode, we'll have a Cocktail Conversation on the Alteryx Community and social media. For this episode, let's talk about how you keep your knowledge current in the fast-changing field of data science. Margot offered us some great tips for using arXiv to keep up with the latest research. What's your favorite way to learn about new and hot areas of data science that apply to your work? Share your thoughts on the Alteryx Community or on social media using the #DataScienceMix and tag Alteryx. We'd also love to see a snapshot of the treat you enjoyed during the episode. Thanks again for joining us. Cheers.

 


This episode of Data Science Mixer was produced by Susan Currie Sivek (@SusanCS) and Maddie Johannsen (@MaddieJ).
Special thanks to Ian Stonehouse for the theme music track, and Jenn Ho for our album artwork.

Comments
NeilR
Alteryx Community Team
Alteryx Community Team

Great idea for a cocktail conversation - I'm always looking for new data science news outlets. Quanta Magazine is my go-to for great science journalism, including machine learning and AI. I also follow KD nuggets for more hands on tutorials and new libraries and stuff like that - a lot of it is re-posted from Medium but I think it's well-curated. What about you, @rafalolbert?

CristonS
Alteryx Community Team
Alteryx Community Team

I love sharing resources! Right now I'm obsessed with the Stanford Social Innovation Review (SSIR). I even get to attend their Data on Purpose conference in Feb, to learn from researchers, policy makers, leaders of nonprofits and foundations; human-centered AI, civic tech, and How to Not Use Data Like a Racist, for a start.

 

@chriswilliams41 you've been in the "data for good" biz forever - where do you go to stay current with data science stuff?

chriswilliams41
8 - Asteroid

@CristonS , that's a hard question because it's a plethora of different outlets. I come here to Alteryx Community because it's the best technical community out there, period. I learn so much from each of you on here. It's great. I just try to keep myself fresh with analytics use cases I see with my clients. Even if I see similar use cases, I try find multiple ways of solving those problems.

 

From an Alteryx standpoint, I just try to learn tools section by section and apply them with the Weekly Challenges. I combine that with staying on top dashboarding tools like Tableau, Power BI, and IBM Cognos. I always try to find ways to develop datasets in Alteryx and send those to other tools for additional reporting. This allows me to be a technician in those toolsets as well as instinctive because of the situations I was in to come to decide on those tools to execute the plan. It's an ongoing process. Familiarity with different work situations eases professional stress. 

 

These are just my thoughts. 🙂

mbarone
15 - Aurora
15 - Aurora

One of my go-to ways to learn new areas of data science (DS) is to watch my feeds on linkedin & twitter, as well as Google. Those will usually lead me to some very good articles about new topics and areas, which will lead me to investigate further. An example is when I saw an article come across my feed on linkedin regarding using AI to detect Cancer cells, and it being a significant improvement from a human looking at cells under a microscope.  Never thought about applying DS in that way, but it was fascinating to learn about how to train and tune the algos to detect the bad cells.


I guess just constantly being curious about new developments is my "favorite" way!

ggruccio
13 - Pulsar
13 - Pulsar

My favorite way to get up to speed is to purchase a great textbook and work through the exercises in the python tool in Alteryx. 

 

Three that I have worked on are all from O'REILLY:

 

Python for Data Analysis 

Data Science from Scratch 

Hands-On Machine Learning with Scikit-Learn, Keras, & TensorFlow 

 

Each of these books approaches different topics in DS. This may be a time-consuming approach!  But I find that I learn best by doing - and evaluating results - and by debugging where I may not have entered code correctly!

LeahK
Alteryx Community Team
Alteryx Community Team

Love this topic! As someone whose just starting to dig deeper into Data Science myself, the more recommendations on where to go to learn, the better.

 

@SydneyF a friend and colleague, who also happens to head up a team of Data Scientists here at Alteryx, recommended a few books to me when I posed a similar question to her several months ago:

(not exactly "new areas of data science", but hey -- for me, someone whose building a foundation -- these books have been AWESOME)

 

@mbarone -- Are there any recent articles or resources that stood out on your social feeds? Also, I don't know about anyone else, but it's been awhile since i've updated my content preferences on linked-in, I get a lot of nonsense -- do you subscribe to any specific topics, groups, companies that you'd recommend?

mbarone
15 - Aurora
15 - Aurora

Hi  @LeahK !  Nothing specific that I subscribe to - lots of analytic/data science/AI info comes to my feed.  I'm guessing it's based on what I have listed as "skills" (LinkedIn).  And then once I started clicking on them and reading them the algorithms keep sending them to me.  Surprisingly I don't get too much junk. 

rafalolbert
11 - Bolide
11 - Bolide

Hi Team/Community,

 

Some resources from me:

 

1) websites/portals/blogs:

https://towardsdatascience.com/

https://www.kdnuggets.com/

http://www.analyticsvidhya.com/blog/

 

2) podcasts:

https://podcasts.apple.com/us/podcast/data-skeptic/id890348705

https://www.oreilly.com/radar/topics/oreilly-data-show-podcast/

https://www.superdatascience.com/podcast/

 

3) people to follow on LinkedIn:

- Andrew Ng

- Cassie Kozyrkov

- Isaac Faber

 

My latest life-hack is that my org has access to O'Reilly, but the book format is not my favorite - not as quick, visual, and interactive as some other ones. Just recently I was pretty desperate to read this tile: https://www.packtpub.com/product/transformers-for-natural-language-processing/9781800565791, which is available on the O'Reilly platform, so here is what I did - I've installed Natural Reader Text To Speed in my Chrome browser and it reads the book aloud to me, it also highlights the sentences as the book progresses and gives me great quality audio in real-time, which I can tune for speed etc. I'm re-discovering books again and this is very exciting, maybe this content delivery trick could work for others as well!

 

Thanks,

Rafal

 

#Excuse me, do you speak Alteryx?