Data Science Mixer

Tune in for data science and cocktails.
Episode Guide

Interested in a specific topic or guest? Check out the guide for a list of all our episodes!

Alteryx Alumni (Retired)

How do you move from discussing AI ethics in the abstract to putting them into practice? Abhishek Gupta, founder of the Montreal AI Ethics Institute and a machine learning engineer at Microsoft, shares tools and best practices and encourages data scientists to share and learn from failures. 








Cocktail Conversation


Have you had an experience where sharing one of your own failures ultimately led to a positive outcome where enduring some temporary awkwardness led to a later success? Tell us about that experience. Maybe you'll inspire someone else to be more open about their own failures and help us all move forward!


Join the conversation by commenting below!


Abhishek CC.png



Episode Transcription

ABHISHEK : 00:00

People will talk about SHAP or they'll talk about LIME or all of these very specific techniques, which disconnects it from the larger picture. And not just the larger picture from the AI life cycle, which is the technical aspect to things, but also from how does this fit within your organization? [music]

SUSAN: 00:21

So whose job is it to make sure that AI is used ethically? And how do we bridge that gap from big statements and principles to the everyday work of engineers and data scientists? Welcome to Data Science Mixer, a podcast featuring top experts and lively and informative conversations that will change the way you do data science. I'm Susan Currie Sivek, the data science journalist for the Alteryx community. I'm thrilled to be joined by Abhishek Gupta for a chat about these important issues of AI ethics. We discuss how to take ethical principles out of the realm of philosophy and into the everyday work and structure of organizations. He's addressed this topic in his upcoming book, Actionable AI Ethics, and we're excited to share his insights and recommendations with you. Let's dive in. [music] Could you give us a little introduction of yourself; your name, where you're currently working, and if you don't mind telling us the pronouns that you use?

ABHISHEK : 01:22

Yeah. Absolutely. So I'm Abhishek Gupta. Born and raised in India. Moved to Montreal 2012 to study Computer Science at McGill, graduated, did a stint at Ericsson doing cybersecurity and machine learning, realized that I had a greater passion for machine learning than I did for cybersecurity, and subsequently moved to Microsoft, where I am currently, where I work on a team called Commercial Software Engineering, where I do machine learning full-time. And in addition to that, I serve as the Responsible AI board member for my organization, which means that I get to touch on issues of ethical AI, guiding our internal teams and how to implement some of these ideas in practice, which, of course, is a-- I guess as we'll get a chance to go into it, a huge thing for me. It's been an important part of my life. And of course, I co-founded the Montreal AI Ethics Institute, which is an international nonprofit research institute that is focused on democratizing AI ethics literacy. We do that through various mechanisms, including fundamental research, community outreach, and translating complex knowledge into things that are digestible for the lay audience.

SUSAN: 02:37

Excellent. Yeah. With such an important topic, it's wonderful that you're doing that kind of outreach to make sure that people can understand what's going on behind the scenes. Pretty cool.

ABHISHEK : 02:44

Yeah. And yes, you asked, my pronouns are he and him.

SUSAN: 02:49

Okay. Terrific. Thank you. Yeah. So as you know, on Data Science Mixer, one of the things that we like to do is try to have maybe a happy hour type snack or drink or coffee or tea or something while we're chatting. So are you having anything special there with you?

ABHISHEK : 03:03

Hydrating. I think hydration is important [laughter], so just water. It's late in the evening for me, so caffeine is out of the picture. I'd like to get a good night's sleep. Anything after 2:00 PM doesn't really work. [laughter]

SUSAN: 03:17

I'm with you there. My cutoff is noon, so I'm a lightweight when it comes to the caffeine. So it's 7:00 AM for me right now so I am having some nice Earl Grey, which is very exciting. Good deal. All right. Well, we are well-hydrated and caffeinated when appropriate so we can start chatting. Terrific. So you told us a little bit about your career path and how you got interested in AI ethics. What was it specifically that drew you to that area? Was there some particular experience that you had or something you worked on that you're comfortable talking about that made you especially interested in AI ethics as an important issue?

ABHISHEK : 03:54

Yeah. So it all began, I think, with my attendance at the AI for Good Global Summit in Geneva, so the inaugural summit that happened in 2017. Was put together by the ITU, the UN agency. And what was fascinating was this was one year prior to the launch of the GDPR, right? So the conversations in Europe around privacy, especially, were quite mature, well-articulated, were quite strong in their voice. And what was interesting was that I realized very quickly when I was there that it was going to be an important issue, not just from a GDPR perspective because of my work doing cybersecurity at Ericsson, but also, the machine learning side of things where I started to see how this inclination to soak up as much data as possible to build more and more accurate systems meant that it would have privacy implications. And it wasn't a realization that others hadn't had, but at least for me personally, it was something that I got more interested in and wanted to have more conversations around.

ABHISHEK : 05:01

And coming back to Canada, what I realized was that the conversations were quite fragmented, actually. So they were happening in silos and they were quite sporadic, and there wasn't really a national unified sort of focus on that. And so I took it upon myself, having a little bit of an entrepreneurial streak myself, I guess, to bring the community together because I noticed two things that were happening. There were these sort of barriers, both self-erected and those erected by others, which hindered participation from people that came from non-traditional backgrounds. So people who, say, didn't have a PhD in machine learning or who didn't have wanted academic credentials were typically not allowed to be a part of the conversation. And in speaking with some folks as I was getting started with this initiative, which started off, really, with a few people in a room back in 2017, was that people from other fields had a lot to offer. And it when I think back to it, I think the first real driver for me was speaking to someone from the field of bioethics and how they thought about the ideas of informed consent and how they went about doing that, how they went about tackling some of the ethical issues when they arise in the medical sciences was what made me realize that these problems at their foundation, they're the same and they've been experienced in other fields, and there's a rich body of literature and practice where people have tried to solve these issues. And they're just now being expressed in the field of AI in a different form. And so there is a lot to be learned from that. And so that's sort of how at least my journey began in the sense of bringing together people. So it's really been about community for me in terms of kick-starting my work in this space because I realize that there is so much to learn from other people and we don't have to reinvent the wheel. I mean, why not stand on the shoulders of giants already?

SUSAN: 07:08

Yeah. Yeah. That's awesome. I love the way that you expressed your entrepreneurial spirit in pulling together this community and at recognizing the interdisciplinary nature of the problems that you're facing and the value of using some of those existing insights that are already out there. That's very cool. So your book, which is coming out either end of this year or early next year is titled Actionable AI Ethics. I think it's really interesting that pulling together this community, drawing on those interdisciplinary perspectives, you have this really strong focus on, "Let's take those things and take action," right? And actually do some things that have an impact on the world. So why has that been such a strong focus for you in your career, this idea of finding actionable paths forward?

ABHISHEK : 07:53

So I think, one, I would say I have an engineer's bias because I'm an engineer. And I mean that not in a bad way that I don't see these as sociotechnical problems because I do and that's a huge part of my work. But I mean it in the way that if you go out and do a quick search even for responsible AI principles or guidelines and all of these things, there are literally more than 100 documents that talk about that, right? I mean, you can go to the OECD AI Policy Observatory and you'll see more than 100 documents. You will see guidelines coming out from corporations, etc. And I think in my observation, we've arrived at a sort of universal consensus on what are the key things that we want to be focusing on, right? And what that means is that now it really should come down to us trying out these ideas in practice because what I've observed speaking to folks who worked at start-ups, folks who worked at other large organizations, of course, observing the work internally at Microsoft, and what I've seen is that a lot of these ideas while great in theory, when it comes to practice, need a bit more nuance for them to be really applicable. And I feel that there is this gap between theory and practice at the moment where we come up with all these great ideas in terms of bias mitigation or applying even, let's say, differentially private analysis in practice. And what are the real trade-offs that happen when we try to put those techniques in play? And the kind of guidance that we provide at a very high level seems to have sort of settled, as I was saying, in terms of a sort of universal consensus.

ABHISHEK : 09:40

But when it comes to actually deploying it in practice, the level of granularity that engineers need, especially when they don't have prior formal training or experience in the field of ethics or in the social sciences, it's a real struggle. And I'm sure the listeners will agree that when you're on a business deadline or a project deadline, you don't necessarily have all the time in the world to go out and search for different kinds of literature and see what's the state of the art, especially on this side of things because unfortunately, at the moment, this is something that seems secondary to the primary sort of business objectives of, "Yeah. We got to build a product that delivers value X, Y, and Z to our customers." And the ethical aspects aren't necessarily included as a core value offering, let's say. So I think that's one of the things that's been a problem, hence the focus on it being actionable. But also, this just overwhelming amount of information, I think, it creates a deterrent for anyone to try and do something, right? It's a little bit like trying to go on a diet, let's say. If someone throws 25 different diets at you, [laughter] you're confused, right? You don't know where to start versus someone-- it's Barry Schwartz's paradox of choice, right? The fewer and more carefully thought out choices you provide, the higher the likelihood that you actually go out and do something with it rather than just sort of keep muddling about and thinking about, "What is it that I should do?" Because I think we've largely settled on at a high level what we should be doing and now it's really a matter of trying it out.

SUSAN: 11:31

Right. Right. That's so interesting. And as far as the techniques that you mentioned, you mentioned specifically bias mitigation and differential privacy. Would you like to talk about one of those in a little bit more detail as far as some of the ways that you see people implementing them and actually using actionable AI ethics?

ABHISHEK : 11:50

Yeah. So, in fact, the way I like to think about all of these techniques is fitting in a larger picture, right? And as a part of this actionable AI approach, actionable AI ethics approach, think of it in the [analog's?] context because I think that's a natural analog in terms of how to think about it, especially when we're thinking about techniques like bias mitigation, it can be applied at several stages of that AI lifecycle, right? Same goes with differentially private analysis, which can be applied early on in the lifecycle or once you've generated the results. And again, each of these techniques can be applied in different parts of the lifecycle. And I really want to encourage that we should be thinking about applying AI ethics in a way that is spread out throughout the lifecycle and is not just merely checkbox-ticking activity where we say, "Hey, I applied bias mitigation so we should be good, right? [laughter] There's nothing else to be done."

ABHISHEK : 12:58

And I think perhaps, again, a close analog is something like cybersecurity, right, where if you go back to three decades, it used to be something that was a gating mechanism that happened at the end, right? Same goes even with QA for software engineering where it used to be something that was done as a gating mechanism at the end of the development lifecycle. But what we realized was that, "Hey, that doesn't really work," right? "We need to start pushing that further upstream and having everybody practice that a little bit more," which basically meant that we started to get into places where unit testing became something that's very common. If we talk about cybersecurity, secure coding practices is something that became quite common so you now, as a part of anyone joining, let's say, Microsoft will take at least an intro course to understand what are basic secure coding practices, whatever, your language of choice, right? C Sharp, Python, whatever else you use. And I think that's an important way to look at it, especially when we're talking about any of these techniques because the specifics will vary based on whatever domain you're operating in, whether you're working in language, or vision, or time series analysis, whatever, right? The techniques will have slightly different variations and implications. But the overarching principle of utilizing that analog's mindset means that you're being thorough and not leaving gaps behind.

ABHISHEK : 14:28

And the reason I say that especially is, one of the fields that I-- one of the subfields, I should say, within AI ethics that doesn't get as much attention is machine learning security. And the most common manifestation of that is using adversarial examples to trigger misbehavior from the system, let's say, right? And what's interesting is I think we, at the moment, don't have enough of an emphasis on realizing and acknowledging that machine learning security is sort of the foundational tenet of AI ethics because if, let's say, with all good intentions, you've applied some bias mitigation techniques at the start of a lifecycle, you hope that the results won't be as biased as if you had not applied the technique. The use of adversarial examples can, for example, through data poisoning trigger things that still produce biased outcomes even after you apply bias mitigation. So it almost renders that whole effort ineffectual just because you didn't think of machine learning security as something that you have to do. And so, again, if you take that lifecycle view, you can now see all of these various pieces. So we're talking about [intercredibility?], we're talking about accountability mechanisms, [various?] technical or organizational bias mitigation, privacy, transparency. All of these ideas then fit as pieces of the puzzle in that lifecycle, which means that they become mutually reinforcing and comprehensive and holistic, leaving behind few gaps versus again, if we just think of it as a checkbox-ticking activity, what we end up doing is leaving behind these residuals in a sense that will come back to bite us at some point later. We don't know when. And that's the other thing, right? We just don't know.

ABHISHEK : 16:21

And it just, I think, reduces, ultimately, trust that we can have in these technologies. And trust, again, I think, is a big word, has a lot of implications. But trust, I'm perhaps just even talking about it in a narrower sense from a reliability perspective that, "Hey, is the system going to perform within certain boundaries that we expect it to perform?" And the boundaries shouldn't just be performance boundaries but should also be boundaries in terms of some of these ethical values, [various?] bias, or fairness, etc.

SUSAN: 16:57

Right. Right. And I think that's a really important perspective, this idea of integrating ethical approaches and some of these strategies throughout the entire lifecycle. It makes me think of developing some sort of slogan like AI ethics is everybody's responsibility and designing some propaganda posters or something so that people start thinking of it as integrated throughout rather than just something that is done by one person somewhere along the way. Yeah. I think that's a really interesting nuance.

ABHISHEK : 17:22

Exactly. And in fact, as you mentioned that, I think the other thing that we also need to recognize as we're talking about AI ethics being the responsibility of multiple people is not just to think about this as just allocating that responsibility without having corresponding accountability mechanisms where-- otherwise, again, what ends up happening is it's diffusion of responsibility, right? If everybody's responsible, then nobody's responsible [laughter] unless you hold them accountable to it. And that's why I think, again, having this lifecycle perspective is important. And one of the things that has also come up in numerous conversations is, yes, there is a lot of focus on-- and it's a little bit bizarre. Perhaps I'm going off the track here, but I think what's interesting is that there's perhaps a huge macro focus in terms of, yes, we need to think about AI, ethics principles, etc. not providing enough nuance or granularity. And then, on the other hand, you have a hyper-focus on specific techniques, right?

ABHISHEK : 18:30

So people will talk about SHAP or they'll talk about LIME, or they'll talk about all of these very specific techniques, which I think is, again, a bit of a problem because then that disconnects it from the larger picture, and not just the larger picture from the AI lifecycle, which is the technical aspect to things, but also from, how does this fit within your organization from an organizational processes perspective, right? And I think that's where, at least from the conversations that I've had, I've seen the most amount of failures because if you try to do something that's too orthogonal or counter to existing organizational processes, you will face a lot of resistance, one. And two, just the uptake goes down quite a bit because you yourself might be extremely passionate and invested in making this happen but that's not the case with everybody, right? And to each their own, right? Our goal, ultimately, is to get everybody on board and realize that this is something that's worth doing. But while that's still not the case, we should make it as easy as possible for them to do this, right, because as you were saying, that we want to have a lot more people take on this responsibility. It's a little bit like thinking about our environmental duties, right, in terms of being more green. If you make things very hard, people are not going to care as much, right? And it's a little bit similar when it comes to some of these impacts of AI where developers are sort of isolated today unless you're working, let's say, at a small start-up or SME, or you're in a customer-facing unit, you tend not to see some of the immediate impacts, right?

ABHISHEK : 20:20

And there's perhaps even a further disconnect for people doing fundamental research who do some of this work and then it might be used in other ways that they did not anticipate or have not been trained to anticipate. And that disconnect both in terms of the immediacy of the impacts and the time horizon within which it happens, I think, is something that's a bit of a problem there, right? So coming back to it, I think when we're talking about these organizational processes, I think that's an important part of how we go about implementing these ideas in practice. And to that end, I would say I think the processes that we have in Microsoft have been quite cognizant of that in terms of making sure that it's a part of our natural workflows when we're trying to implement these ideas in practice. To give you an example, and perhaps for those who are listening and were interested, the World Economic Forum actually published a case study on Microsoft's responsible AI practices. And I guess observe that I use the word practices and not principles because we've had the principles out for a long, long time, really, but we've been quite heavily invested and focused on our practices to really bringing this to the forefront.

ABHISHEK : 21:37

And of course, I can't do it justice in terms of describing everything because there's a lot there, so I would encourage folks to check it out. But what I would say is that we've got, for example, the Office of Responsible AI whose sole duty or whose sole mandate is to put these ideas into practice, integrating them into the existing organizational processes, creating material. Creating a Responsible AI Champs program is an example where you have these touchpoints, these contact points for people to go and ask questions within their organization, so people they're already familiar with rather than having to go and hunt for, "Well, I have this question. I don't know what to do," so.

SUSAN: 22:17

"Where's the ethicist?" [laughter]

ABHISHEK : 22:19

"Where's the ethicist? And what should I do? We have these principles. I don't know how to implement them. I got to deliver this project by the end of next week."

SUSAN: 22:26

"What now?" Yeah.

ABHISHEK : 22:27

And all of these, these are very real things that happen for practitioners out in the field. And I think just being sensitive to how we as practitioners face these issues in the real world, I think that's, for the lack of a better word, the empathy that we need to have as we are going about all of these ideas to really make them actionable, to really actually see them be put into practice.

SUSAN: 22:55

Well, this is such an interesting point and we'll definitely try to link to that case study that you mentioned in the show notes so that folks can find that. So it seems like a really, really important issue this disconnect between the micro-level techniques like you mentioned, and the larger philosophical principles. But where's that intermediary step of figuring out how to make it work at an organizational level? That seems just super, super important. So definitely something folks will want to think about. Cool. So I want to go actually maybe to that philosophical level for a moment because I was really interested to read about your work with the Montreal Declaration for Responsible AI Development, which is an awesome name, and I believe that came out in 2017. There were some interesting aspects of that that talked about protecting human well-being and ensuring privacy, not causing division, and trying to contribute to social equity. So we're four years out from that now, so I'm curious how you reflect on that and what's happened since then. Do you think that we are getting closer to those goals or are we moving in that direction? I know that's a huge question, but [laughter] any particular aspect of it you'd like to focus on?

ABHISHEK : 24:04

Yeah. I think that's the trillion-dollar question, right? [laughter] In the sense that people talk about the impact that AI is going to have. And you can look at all of these consulting firms that put out reports that talk about the trillion-dollar impact that AI is going to have on the economy. I think it's a question of that magnitude in the sense that I was involved in the very early days. In fact, I was involved in the creation of that document when it was just a two-page French document. And I met up with the creator who graciously walked me through the French document because my French is limited in its technical fluency, let's say. So we sat through it, we worked through it. And in fact, as a part of the Montreal AI Ethics Institute, we organized seven public consultation sessions that actually fed into the development of that document over time. The goals, I think, are quite comprehensive in terms of coverage, and they provide great North Stars in terms of what we should be looking for aspirationally. But I guess it still falls in that same bucket where all of these ideas are great but we need a bit more nuance, right? And I think the Montreal Declaration Responsible AI does go a little bit further in the sense that for each of these principles, it does provide you some questions to think about, which is a great way to make that a little bit more concrete.

ABHISHEK : 25:29

And certainly, I would say that if one was to be able to answer those questions or achieve adherence to those principles, those would be hallmarks of a responsible AI system. Has the field been able to achieve those? For the most part, I would say no, unfortunately, in the sense that, again-- and I participated in so many conferences and gave a lot of talks and did a lot of panels, etc. I still see that there has been a greater degree of emphasis on debating and refining, perhaps, some of the nuances around these principles which are required. Yes, I completely agree with that aspect, but also, what I would like to see more of is for people to try these ideas out in practice, because a lot of the times when you try something out in practice, we realize that, hey, this doesn't work in its current form and we need to do something different or we need to iterate. And I think perhaps that's also-- I was talking about my sort of entrepreneurial spirit in a sense that that also is a mindset that I bring to this, in the sense that when we talk about AI being data-driven, why aren't we being data-driven about some of these ethical AI practices also, or these ethical AI principles? In the sense that let's, as an organization, talk about, "Hey, I tried out this principle set or set of guidelines and you know what? X, Y, and Z worked, A, B, and C didn't. Here's what we tried to do to get A, B, and C to work, which led it to become D, E, and F. And guess what? We kind of get a good degree of adherence to these ideas but we need to keep refining."

ABHISHEK : 27:17

And I might be wrong because I'm just a single person and there's only a limited perspective that I would have on the field, but I would encourage folks who are listening in to leave comments and mention if they've seen case studies like this which talk about people actually trying these principles out in practice and where things have worked, but more importantly, where things have not worked because I think that's where we'll get our lessons from, right? That's where we'll get ideas from an operational perspective in terms of how any of this is going to really sort of materialize in practice and how this is going to work in practice. And to me, what's inspiring is the work that Guillaume Chaslot, who used to work at YouTube before, and he created this thing called AlgoTransparency. When we talk about all these rabbit holes, feedback loops, the vicious cycles on YouTube that lead to a lot of polarization, okay, can we get a bit more nuance or concreteness in terms of, how does that actually materialize in practice? Can I try that out myself so that I can get a sense for where and how this actually materializes? And you as a tool provider, as whatever, as a researcher, show me quite concretely in my context how that manifests itself, right? And AlgoTransparency helps to do that in the context of YouTube, which I think is very interesting because it makes the issue a lot more real, right?

ABHISHEK : 28:47

Again, we don't have a dearth of people today, I would say, talking about these issues. Certainly wasn't the case when we started in 2017, which led me to start all of this in the first place. But I think work like that really helps to bring some of these ideas to the forefront. Another thing that I've seen that has been particularly heartening, I think, is leveraging the power of community, right? In the sense that when we're talking about these ideas, these principles, going back to the genesis of the Montreal AI Ethics Institute, what was interesting was that it wasn't perhaps a deliberate choice. It was just that I thought that it would be valuable to listen to other members of the community. And fast-forward to today, of course, a lot of people talk about bringing in community stakeholders. But where does this lead to concrete results? As an example, you can look at this group called Masakhane NLP. And what they're doing is they're utilizing folks from the community who have expertise in language, who are NLP practitioners, and bringing them together to create more well-performing NLP systems for low-resource languages, right? And so they operate in an African context. And some of the work that they've done has earned them the best paper awards at a lot of the NLP conferences, right? And it's a fully community-driven initiative, right? Which speaks to the value that a diverse community can bring, especially when you're looking at something like low resource languages. There are fewer speakers of these languages, the data sets are not as pervasive.

ABHISHEK : 30:27

So they're working from the ground up trying to address these issues in a manner that leverages the experiences, the capabilities of people from around the world, which also makes it easier to achieve. Because if you were to now hire a team to work on a specific African language, that requires a lot of coordination, requires a lot of effort, money, and all of these things, whereas this community-led effort has naturally brought people together who are interested in these issues and working on them together. Of course, now the people behind that initiative are to be given credit for coordinating the activities of that community, but it just goes to show as a concrete example that it's something that's possible. And I think that's what we need to start doing with the build of AI ethics as well. If we were to create a community of practitioners who faced real organizational challenges, then we can move from these very lofty principle sets, ideas, like the declaration, into something that's a bit more actionable.

SUSAN: 31:34

Do you think that the barrier to that happening right now is just in terms of people's capacity and time and ability to engage in those sorts of things? Do you think it's around not wanting to share the things that they're working on or share some of those failures that they might have experienced? What's keeping people from being more engaged in that communal effort, do you think?

ABHISHEK : 31:56

So my speculation is that I think we just generally have an aversion to failure, right? And it's a more macro comment than just in the field of AI because we're all chasing that state-of-the-art, right? And nowhere is that more evident than in conference publications, right, where you're just constantly choosing that extra 1% because then you get a [inaudible] paper on your CV or academic record and whatever else that that brings you, right? Which is dubious, in my opinion, that that brings much value. In fact, what would be interesting-- and to that effect, with a few colleagues, we organized a workshop at the MLSYS Conference, I think two weeks ago now, that was titled JOURNE. And the goal of that conference, really, was, how do we think about some of the failures, right? How do we think about some of the ways in which we arrive at these results, right? In the sense that you don't always have a straight path getting through to success, right? It's a meandering path that gets us to the place where we come up with the state of the art, right? So JOURNE, which basically stands for your Journal of Opportunities, Unexpected Limitations, Retrospectives, Negative results, and Experiences, [laughter] the reason it's a mouthful is because we wanted to capture that messy, complex journey and embrace that, and provide a platform for people to come out and say, "Hey, here are things where we failed at miserably. Well, here's what we learned," right? And normalizing sharing of failure, I think, is so important.

ABHISHEK : 33:39

And again, I would encourage folks to check out some of the papers that were published as a part of the workshop, some of the talks that were given. It was heartening to see some of the really, really well-respected researchers in the field talk about their journey, talk about where they have experienced failures, and how that led them to their purpose in terms of what they're pursuing now. And so I think just normalizing a sharing of failure, in fact, taking pride in our failures. The reason I say taking pride is because it shows to me that you actually tried, right? What other proof do we have, right? Is that I tried and I failed. That's okay. You tried. That's fantastic, right? And I think that's one of the big barriers, I think, in the field today.

SUSAN: 34:30

Yeah. It reminds me of the people who have created kind of the CV of rejected papers where they list all of the things that they submitted that didn't get published and they've shared that to say, "It's okay. At least I'm out here putting stuff out there and I'm trying. And these things may not have actually found their audience, but at least I'm doing the work." So I think that's a really good point about taking pride in our failures. I'm going to work on that myself. [laughter] Awesome. So one of the other things I was curious about as I was poking around on your website and looking at your work, I noticed that you have put out to the world an offer to do one-on-one ask me anything sessions on AI ethics. And I'm curious if you've had folks taking you up on that, and if you've had any interesting conversations.

ABHISHEK : 35:14

Yeah. In fact, the response to that has been heartwarming that people do find value in doing something like that, and yeah. And the reason for doing that ask me anything more so than positioning it as me being an expert or whatever that means is more so just as a sounding board for sharing what some of your concerns are, and then me sharing my experience where I may or may not have experienced something like that and some ideas that I have that I think work, which I've seen work, and other things which I would really like for some folks to try out and give me feedback if they work, right? And I think one of the things that's been interesting that's come out of these conversations is how a lot of the challenges are shared across the types of organizations that you work for, the kind of work that you do, and the domain that you operate in. And it was so interesting that this sort of hunch that I had, that organizational constraints actually play perhaps a big if not bigger role than some of the other technical issues, which I think are more tractable, is that that's something that's shared across all of these people where no matter whether you're working at an academic research lab or you're working at a start-up where you don't have that much funding to invest effort into this, or you're looking at a large corporation where you run into perhaps misalignment of incentives in terms of what should we prioritized, the organizational constraints were something that started to jump out as a shared problem, which I think is interesting that it's heartening to see because this is something that has rich literature already, right, in terms of change management, in terms of folks have already studied organizational constraints and the clash that sometimes that has in terms of achieving the mission, vision, the values, or adhering to the values that an organization has. So that's great.

ABHISHEK : 37:24

The reason that's great is because we can borrow from that, right? So we don't have to again, start from scratch. And I think that's again, one of the other things that sort of came out from these conversations is speaking with folks across these artificial boundaries, right? So take up this conversation around AI ethics if you're a data scientist with someone who's a business executive, right? Someone who's responsible for your business unit and have a heart-to-heart with them in terms of what you think is valuable and what's preventing that from materializing, right? And in one of the conversations, it was interesting that the person thought that this was something that needs to be addressed more so from a technical perspective. And my advice was, well, yes. I mean, of course, because ultimately it is a technical endeavor. But a lot of these technical measures that we have today, unless you're a fundamental researcher, you're perhaps not going to change the techniques themselves. So you have your pick of various techniques that are available, you take something and you put it into practice. What is more important in your own context is then to talk to the other stakeholders in your organization and then don't hesitate really to speak with people who are in your legal department. How many times do you as a lead designer go and talk to someone who is in the legal department? You don't really, right? And neither do they.

ABHISHEK : 38:52

And it's not just your own comfort. I think it's also our own openness to having those conversations, so making the other person feel welcome and not feel for the lack of a better word inferior in terms of their own knowledge, right? A lot of people-- and it's true. I mean, of course, it's a technical field, right? So there are specifics that people don't get, which is fine, right? Let's make an effort to extend a, quote-unquote, "an olive branch" and explain to them in their vernacular what some of these things are and why some of these problems are surfacing, right, and how are they manifesting in our context? And understand from them, right? Hope that they, from a legal perspective, would explain to you what the law stipulates, right? I mean, heck, legally, this is super complicated, right? Half of the words don't make sense, [laughter] right? Right? And how do we make that easier, right? So I think it's basics of human communication. It's basics of organizational constraint. And guess what? Again, we have tons of literature out there. There are tons of people who've done research on this. So let's speak to them, right? Let's learn from them. Because that really is the way we're going to move forward.

SUSAN: 40:08

Yeah. So much of what you're speaking to is around us constraining ourselves and not wanting to feel awkward, not wanting to feel like failures. [laughter] But really breaking down those barriers, having those conversations that might feel initially awkward or unusual, I mean, that seems like part of the way forward, as you're saying. So great points. We have limited time left. So I wanted to just get a couple of other questions your way here. Just to go to that micro-level that we've alluded to a couple of times here, you have a section in your newsletter and on your website for the AI Ethics Tool of the Week. So do you have a couple of favorite tools that you would recommend to data scientists and others for actually taking action on these AI practices?

ABHISHEK : 40:51

Yeah. Yeah. Absolutely. I think the biggest one that's been of influence and that's seen me transform my own everyday programming, let's say, is the use of PyScaffold and the Cookiecutter templates that you can integrate with PyScaffold, which actually, was a discover-- well, discover. Which actually, was the thing that I found out incidentally because the Cookiecutter templates are fantastic, right? They, for those who are not familiar, provide you with a standardized structure for your projects. And I'll come to why this is relevant to AI ethics in a second. But combining that with PyScaffold, what it does for your AI project, it provides you with a strong degree of consistency and well-defined structure, not just for your team and different projects that you're probably going to embark on as a team over time, but also across your organization of this as a practice that is standardized. And, of course, you can create custom templates work as well. To me, this is super important from an AI ethics perspective because it fits quite neatly into that whole [analog's] mindset. And perhaps the listeners and you observe that I'm a lot more inclined towards processes and having things that help to standardize things. And it's not because I'm a control freak or I have OCD, but it's more because I think it lowers the barrier to putting these ideas into practice because you don't have to think, right? Someone's done the hard work of thinking how this should be done in a way that is comprehensive. So you're not always looking over your shoulder and thinking, "Hey, did I miss something? Is there something else that needed to be done?" Like, "Hey, someone's already done most of the work. So what I can do is just start from that," right?

ABHISHEK : 42:44

And so I think PyScaffold with some of these Cookiecutter templates is fantastic for getting that degree of consistency. Another tool that I found to be particularly useful is this tool from DrivenData. It's an organization that puts out AI tools, and they've got this thing called Deon. And so Deon is a way to create checklists for AI ethics and it comes with a starter checklist again, in the interest of just getting off the ground with something without having to think too much about it. But what I've done with that checklist more interestingly is to tailor that because you can provide YAML templates as an input and generate your own custom checklist, is to generate checklists that are tailored to each stage of the lifecycle. When you start to put a checklist like that together, it does two things for you. One, again, it helps you be more comprehensive, avoiding things that you might have forgotten. The second thing is that it puts these ideas front and center for you, right? It makes you think about them consciously because when you're creating that checklist, you can be weighed because if someone has to check off an item on that checklist, they need to know-- when we talk about in agile, what is the definition? Done, right? This sort of helps you manifest that in a very concrete way, saying that, "Hey, I've tried techniques X, Y, and Z. These are the results that I got. This was the degree of balance that I was able to achieve. This was the trade-off that I got in terms of accuracy," or whatever. Any of these sort of things that you would want to track, you put them down as explicit items on the checklist.

ABHISHEK : 44:29

And because we've broken down the checklist in terms of the stages of the lifecycle, you can link to the items, you can cross-reference the items. The fact that the checklist is in a YAML template where you can version control it, it in fact generates Markdown outputs, HTML outputs, PDF outputs. It can also generate artifacts that you can share with your non-technical stakeholders, which is also very useful.

SUSAN: 44:58

Terrific. Oh, these are great recommendations and we'll be sure to put some links to those resources in the show notes too so folks can check them out. Great. So one question that we always ask on Data Science Mixer that I'm going to ask to you as well, we call this the alternative hypothesis, and so the question is, what is something that people often think is true about data science or about being a data scientist, working in data, that you yourself have found to be incorrect?

ABHISHEK : 45:26

Yeah. [laughter] Maybe this doesn't come as a surprise to the listeners of this broadcast, but I think a lot of people think that data science is just cool research. Finding new models. Turns out that's not the case, right? [laughter] I mean, of course, there are people who do that work, right? It's not to say that that doesn't happen. But there's a lot more work that goes into this whole process before, right? Before you even get to the modeling stage or preprocessing of the data, the cleaning. And I'm sure everybody has gone through those stages, right? And I think one thing that people have a conception that this is dirty work or this is grant work or this is unnecessary work even, maybe. And maybe they sort of feel discouraged by that, right? Especially people who are entering the field have all these sort of rose-tinted lens on what it's supposed to be and then it turns out that it's not, right? It's like 80 or 90 percent just what they call grant work. But what's not?

ABHISHEK : 46:30

And I think we need to start debunking that a little bit, being a bit more transparent about perhaps what our day-to-day looks like, right? And again, this can come from some of the folks who are more senior on your team if you're in a corporate research or a corporate applied team or if you're in an academic research lab, etc. Wherever you are, just being more forthcoming about that. And I think, again, just documenting the real value that you get from doing all of this, quote-unquote, "grant work." Al least from experience what I've found is that that yields so many more insights than trying out a whole bunch of models and finding which one works better. It actually doesn't do as much for me as just really getting in with the data, right? And mucking around and seeing where there are missing pieces of information, why certain assumptions have been made, how that data was captured. Just really getting in with the details, I think, helps to unearth a lot more insights that I've found in the long run over the AI lifecycle actually leads me to build better products than just being hyper-focused on, "Hey, I'm not going to deal with any of that data processing stuff because that's grant work. That's unsexy. [laughter] So I'm going to focus on the modeling." Yeah. And I think that's something that perhaps needs to be debunked a little bit more because I still see new people coming into the field who are disillusioned, disenchanted, discouraged by this, so yeah.

SUSAN: 48:02

Right. Yeah. Yeah. I think that's certainly something I've heard other folks say as well. And one of our recent guests actually said that she thinks of cleaning her data as learning her data, which I thought was a nice way of reframing it a little bit at least. But certainly, it still doesn't seem like the most glamorous thing that folks are doing every day. So yeah, interesting point. So I know we're short on time now. Is there anything else that you would like to add to what we've talked about that we haven't gotten a chance to discuss, or?

ABHISHEK : 48:34

All I would say, I think, is that when it comes to responsible AI, it really is something that we should take to heart. What I would encourage the listeners of this podcast to do is to think of this as another dimension to your AI practice in the sense that it'll help you distinguish yourself, right, if nothing else, if you don't care about any of these values, which I hope is not the case. But if that is the case, just think of it as adding another dimension, a technical profile. Something that will help you distinguish yourself, something that'll help you become more effective. And I think even starting with that mindset and then evolving into a place where you really start to value this is, I think, a great starting point for all of us in the field. It really comes down to how many of us think this is worthy of our time and then making active efforts to putting this into practice. The last thing I would say is, start moving from principles to practice, please. I think we've had enough theoretical discussions. Try out these ideas and share your failures. Tag us, let us know what worked, what didn't, so that we can all learn and grow together. [music]

SUSAN: 49:46

Thanks for listening to our Data Science Mixer chat with Abhishek Gupta. Just a quick side note before we wrap up, Abhishek mentioned the Masakhane NLP project where a community is working together to strengthen NLP research on under-resourced African languages. We're excited to have Vukosi Marivate, one of the researchers, join us on an upcoming episode, so watch our feed for that. And be sure to join us for this week's cocktail conversation on the Alteryx community. It's always tough to admit to failures, especially publicly, but Abhishek talked about how important doing that is to advancing our shared knowledge. In your own data work, have you had an experience where sharing one of your own failures ultimately led to a positive outcome, where enduring some temporary awkwardness led to a later success? Tell us about that experience. Maybe you'll inspire someone else to be more open about their own failures and help us all move forward. Share your thoughts and ideas by leaving a comment directly on the episode page at or post on social media with the hashtag data science mixer and tag Alteryx. Cheers. [music]




This episode of Data Science Mixer was produced by Susan Currie Sivek (@SusanCS) and Maddie Johannsen (@MaddieJ).
Special thanks to Ian Stonehouse for the theme music track, and @TaraM  for our album artwork.