Data Science Mixer

Tune in for data science and cocktails.
Episode Guide

Interested in a specific topic or guest? Check out the guide for a list of all our episodes!

VIEW NOW
MaddieJ
Alteryx Alumni (Retired)

Alex Engler, Brookings Institution research fellow and data scientist, walks us through the present and future state of public policy regarding AI. We explore how policy, ethics and innovation interact, and what that means for data scientists' everyday work.

 

 


Panelists

 


Topics

 

 

Cocktail Conversation

 

Alex cocktail convo.png

 

Alex told us that he wasn't super excited about some of the things he had to learn early in his data science studies, at least not until he found ways he could apply what he learned to the issues he cared about and connected with a community of like-minded people.

 

What's kept you going in data science, even when the going was rough? What's inspired you to continue to learn in the face of challenges?

 

Join the conversation by commenting below!

 


Transcript

 

Episode Transcription

SUSAN: 00:00

[music] Welcome to Data Science Mixer, a podcast featuring top experts and lively and informative conversations that will change the way you do data science. We've got happy hour fun on the menu today with insights into AI and public policy. Grab your favorite drink and snack and get ready to learn and enjoy. I'm Susan Currie Sivek, the Data Science Journalist for the Alteryx community. And for this episode, I sat down with Alex Engler.

ALEX: 00:25

Excited to be here. Thanks for having me. I am currently a fellow at the Brookings Institution where I study artificial intelligence and what government should do about it. I also teach data science for public policy at Georgetown University. And I've been teaching classes in that space for a while. And he/him is totally fine with me.

SUSAN: 00:46

The intersection of data science and public policy is fascinating to me. And my conversation with Alex answered a lot of my top questions. We talked about what's happening in the US and internationally with regulation of AI and ML in the private and public sectors. We explore how future policies can live in harmony with smart and ethical innovation in the field. And Alex describes how and why companies and individual data scientists should pursue meaningful introspection when practicing data science. Let's get started. [music] Awesome. Sounds great. Well, as you know, one of the things that we do with top-shelf data science is we love to celebrate snacks and drinks. So is there anything in particular that you're having as a snack or a drink right now or that you're looking forward to today?

ALEX: 01:35

So I haven't had anything yet this morning, but then I'm looking forward to a very good sushi place around the corner for me, in DC.

SUSAN: 01:43

Awesome. I bet you have really good sushi in DC, too.

ALEX: 01:46

Shockingly, I really enjoyed the sushi when I lived in Chicago, which being in the middle of the country, you wouldn't expect to be as good as it is. But it actually really stands out. Yeah, I don't know what to tell you. Not sure how they're [doing?] [crosstalk].

SUSAN: 01:57

Awesome. Sounds great. I live in rural Oregon, so our options are limited. [laughter]

ALEX: 02:04

[inaudible].

SUSAN: 02:04

[inaudible]. Very nice. Cool. All right. Well, I am very curious about how you got into data science, and how you then kind of segued into policy and civic uses of data science, and then into, basically, this think tank research institute kind of environment. So can you tell us a little bit about how your journey came about?

ALEX: 02:26

Yeah, of course. I love this because I am a data scientist, and I sort of self describe that way. And when people hear that, they always, always assume that I was a data scientist first, and then I came to policy and governance later. And in fact, the opposite is true. I have been interested and focused on policy and governance my entire life from college and there out. In graduate school, I was taking statistical classes at Georgetown and hated them. I was totally disengaged and couldn't care less, and had no idea what these had to do with policy or governance. And it wasn't until an internship at the Sunlight Foundation, specifically in a group called Sunlight Labs-- unfortunately, Sunlight Foundation is no longer around [as of?] recently, but it was an organization focused on improving technology and use it in government, as well as using data and new technical systems to open government up to make it more transparent. And when I was there, I was working with really fantastic people who are building new data pipelines, and then using that data to learn about governance. And so when I saw it in person, I was like, "Oh my God, this is not what I'm really learning." And it's a fundamental expansion on what policy analysis has done for a long time. Policy analysis has been a pretty specific subset, [with?] maybe causal inference or experiments. Right? How do we learn about what policies work [in?] a pretty narrow set of data analysis tools? What I saw at Sunlight was a dramatic expansion of that. And I got more into data analysis, and data visualization, and data science overtime after that. And that's what I spent the last 10 years doing, is apply data science in public policy. What I'm doing now at Brookings is a little different. I'm now more focused on what government should do about the private sector [using AI?].

SUSAN: 04:17

Awesome. I love that. It's so interesting to hear you say that you hated statistics at first because that's certainly something that I think many folks experience when they first get into studying those topics, there's a little bit of resistance and not a lot of enjoyment there at first. But it sounds like once you saw some of those applications in an area that you really cared about, that turned things around for you.

ALEX: 04:37

Yeah, that's right. Empiricism is a hard sell if you don't know why you're doing it. It's difficult. It's time consuming. There's math. It feels low, and you have to learn to code. And someone was trying to teach me Stata, and all of these things were not in [inaudible]. When I started to see what happens when you build new information, when you make accessible and engaging data visualizations, when frankly, I much enjoyed working in open-source programming languages like R and Python, much more than I did in Stata, all of these things opened up a community in a world of insight that I found really compelling. And maybe my first important takeaway was that the realm of data that government was using was pretty limited and we're just seeing an expansion past that now. We're just seeing more machine learning, more natural language processing, more image analytics, appearing in governance. And that's good. And it shouldn't replace the things that governance has been doing for a long time but it's-- and it's an important addition. And that's sort of the value that data science brings to that realm.

SUSAN: 05:46

Yeah. Absolutely. So many different potential applications. I love hearing you say, too, that it was kind of the community of people that you got involved with as well that inspired you to continue on in the field.

ALEX: 05:46

The R community, especially, it's hard to overstate how big the difference between Googling things about R and finding a tutorial that says here's how you do a thing that matters to you instead of Stata, where you have a giant book and there's an old Listserv of really convoluted examples. So I do think open-source languages and the community that's built around them-- I'm especially fond of the R stats community, opens up empiricism. So there's a democratizing effect of open-source languages in that they're free and they're available to everyone. But a fundamentally different reason that they're democratizing is that the community is so welcoming, right, and it engages people. And I think if you look at the R community, there's a reason that it has a ton of, not only economists and political scientists, but also genomics and bio stats, right there, people coming from all sorts of different angles to that language. I think it has a lot to do with the way in which it welcomes people who don't think of themselves first and foremost as data scientists, but think of themselves as passionate about a problem, and they need data to learn about that.

SUSAN: 07:16

Right. Exactly. Yeah. This is amazing to hear you talk about how all of these things really fed into your interest in data science and helped you find ways that you wanted to apply it and pursue it as a career. One of the things that you mentioned earlier that I want to definitely come back to you as a main focus of our conversation is, as AI and ML are becoming more widely adopted throughout the public sector, there are also going to be some limitations, and some requirements, and restrictions placed on those uses and, of course, on private sector uses as well. So I was hoping you could talk to us a little bit about some of the regulations that you see being put in place recently or maybe about to happen in the US that you think are really significant, that are going to have a big impact on the trajectory of data science. I know that's a huge question, but interested to see what really stands out to you.

ALEX: 08:08

Yeah. So it's a really good question. And there are two fundamentally different issues here that we want to think about. The first is how government should use AI in data science to improve governance and its services. The Trump administration put forward new guidance that sets a timeline for federal agencies to do an audit of all the ways they're currently using AI. That should be done about halfway through 2021, assuming that the Biden administration holds to that timeline, which they don't have to, but they might. And then that audit, the inventory should be public towards the end of the year. And there's already a good report on this out from Stanford [in?] the administrative conference in the US. And if you want to look that up, it shows you a bunch of use cases of how government is using AI for fraud detection, and for turning text submissions into codes. Right? Also, how to better sort mail. There's all sorts of applications of AI happening now. That inventory is going to be useful because it's then going to be used to help set standards on the use of AI. So we need to know how government is using it before we can start considering in what circumstances we need to set standards. This is difficult work and need to be taken seriously, and there are some areas that are going to be controversial, but in some ways, this is a fairly clear path forward, and we know how to approach the problem.

ALEX: 09:36

A separate, harder, or at least more controversial and maybe less concrete problem is how to govern the use of AI in the private sector. There is also executive guidance from the Trump White House that came out right before the end of the administration that they had started working on this from about a year and a half. That was Dr. Lynne Parker, the deputy chief technology officer, who, as far as I can tell, is still in the administration and still working on these issues, as well as Michael Kratsios, who's the Trump administration CTO who seems to have left that role. And as of right now, that's all it could possibly change. And this is probably more relevant to a lot of the private sector data scientists out there, which is, how can the government set rules and set standards about the [private use of AI?]? And that's going to be a big question, partially led by the US and partially by, it seems, by the European Union.

SUSAN: 10:33

Right. Yeah. And I'd definitely like to come back to the developments in the EU, here in a little bit. So you mentioned the audit and inventory that will be released. What are some of the key things that they're paying attention to as they're putting together that document? What's the motivating impulse behind it? Just to gather all the different uses that are going on or are they looking for particular things that are of concern or interest?

ALEX: 10:57

That's a good question. And we don't know that much yet. The document that came out from the Office of Management and Budget essentially says that they want to make AI efforts in the government trustworthy, and then lists a series of principles around the ethical use of AI that makes it trustworthy. It mentions working with related government agencies, whether it's the agencies that are implementing this work, or whether it's best practices from the National Institutes of Standards and Technology, or working with the National Science Foundation. And that's all great but very clearly, we're at the vague ethical principles stage of what precisely the [inaudible] is going to do and how it uses AI. The value of doing the audit, the inventory of used cases of AI is really to get a conversation started about what types of standards there need to be. And I think you can reasonably expect that work to come back to a centralized office, either the Office of Management and Budget or also, quite likely, this new National AI Initiative Office. And then the sort of more tangible work will get started. At the first pass, you really tend to get these very vague AI ethical principles, and it's completely unclear how those are actually going to get implemented.

SUSAN: 12:31

Interesting. So the National AI Initiative Office, I was noticing as well there's part of that, a subcommittee that's going to be dedicated to AI and law enforcement. So I thought that was an interesting move on their part to establish a group with particular attention to that.

ALEX: 12:49

That's certainly the area of artificial intelligence and even data use that's most controversial in the government. It would be great to set higher standards for the use of algorithmic systems for pre-trial release. So whether or not you release people before their trials, which some localities use algorithmic systems for, as highlighted in the COMPAS algorithm by ProPublica. That's an area where I think these systems are pretty tricky. It's really hard to tell if they're even capable of helping. A good way to think about this is that there is some research that suggests these systems are discriminatory. There's also some research that suggests these systems are less discriminatory than people. And then there's also some research that says, "Well, when you take the systems and actually apply them in practice, they don't end up helping. And the combination of the algorithmic recommendations and the judges don't really meaningfully improve decision making." My takeaway from this is we spent an awful lot of time and money on this problem, and we may be better off had we been spending that time and attention on improving services for people being released from prisons. That's just me. But there's no question that we can examine what these localities are doing, and maybe set higher standards, and make sure that truly egregious practices are still in use. And there certainly have been some really egregious practices at state and local law enforcement levels.

SUSAN: 14:35

Right. And yet, in the broader context of regulation of AI and ML, there have been some really interesting moves at the local and state levels to put in restrictions and regulations, stuff that hasn't happened at the federal level yet. So I wonder how you see that progressing. Do you think that there's going to be more action at the lower levels versus federal restrictions or are we going to largely see some guidance coming through these newly established organizations?

ALEX: 15:05

Yeah. That's a good question. So the Illinois clearly is moving forward with prosecuting illegal uses of biometric data through its law, BIPA. And that's an interesting step forward, and they seem to be committed to protecting their citizens from having their data used unlawfully in the state of Illinois, which is interesting. And I think they're justified to say, "Yeah, listen, you took this data without their permission, you're using it in a service that they didn't consent to be used for. And we're going to say that's a crime. And if you do it, and it affects our citizens, we're going to find you." And I can totally understand why a state would feel compelled to do that. You can imagine an issue in which if a whole bunch of states pass a whole bunch of individualized laws like that, it's gets very hard for anyone to build anything that meets all the requirements of 50 states. So there is a potential long-term concern about having everything done at the state level. This is why the European Commission is taking on digital governance and artificial intelligence governance, because it makes much more sense to have a single consolidated set of rules. So while there is an advantage, it would be nice to get some federal guidance. I don't think we should expect any federal legislation in the near or immediate future. There's just so much other critical work to be done by the Biden-Harris administration. They need to handle the COVID pandemic and the economic recovery. They are going to be under a lot of pressure for pro-democratic political reforms, all of which, to be honest, should take precedence over AI legislation, not to say that there are important things we could do. And so I don't expect much more than the regulatory actions in the immediate future.

ALEX: 17:00

If you wanted to be a little more optimistic, I think the people that the Biden administration are bringing in are very technologically savvy. Right? They're bringing really qualified people to improve everything from the data collection code used for vaccines and keeping track of coronavirus, all the way to the people who are now running the Office of Science and Technology Policy, which, by the way, was raised to a cabinet-level office, which is great, elevating the role that science is going to play in this administration. There's people there. I'd highlight Alondra Nelson, the deputy leader of that office, who has a long history looking at the sociological effects of tech, specifically for her in modern genomics. But that strikes me as a person who's going to pay a lot of attention to AI in the future. So in the short term, I think the Biden administration, at a federal level, is going to be really focused on the myriad of crises immediately facing the country. But the people they're hiring, if they're around for four years or even longer, who knows, they could have a really meaningful effect on what federal technology governance looks like.

SUSAN: 18:09

Yeah. And it's so interesting. And I think, often when we see news coverage of new regulations or, "The FTC has done X, or the OMB issued this guidance," it's kind of from a faceless agency. We don't often get a lot of insight into the people behind these things. And I guess from a very sort of naive perspective, I would be curious to know more about some of these folks' background and what their expertise is that they're bringing to this role. I don't doubt that they know stuff, but we usually just hear about it kind of in that very vague agency attributed sort of way. So it's interesting to hear about these folks having, in some cases, really deep experience. That's really cool.

ALEX: 18:50

Well, it's interesting that you say that. One thing that's certainly true about the idea of AI governance is that there is no centralized location. The FTC, as you mentioned, has done some enforcement actions. I think I approve of the most recent one in which they actually confiscated models that were built on illegally collected data. And I actually think that's the right thing to do. But I can understand, if you're in the industry, you're not really sure who these people are and where they're coming from. It's in part because there's no long history or centralized authority on AI governance, so you're not really sure who you're supposed to be even paying attention to. It [doesn't?] seem like this National AI Office is going to be important, I mentioned Alondra Nelson and Lynne Parker. The FTC could [feasibly?] quite important. And then whatever your specific agency is. So if you work in employment algorithms, you might pay attention to the Equal Employment Opportunity Commission. I'm sure you already do. And if you're in finance, you're going to pay attention to CFPB and the FTC. Right? But that said, it is a little confusing because this work hasn't been around for so long that it's obvious who to look to if you're interested, if you want to know what the federal guidance is.

SUSAN: 20:08

Definitely. So one of the things that I can imagine folks saying as they're listening to us talk about regulations and federal offices becoming involved in AI and ML development is well, wouldn't regulations and restrictions somehow impede the growth of these new technologies? Are we preventing potential positive applications of these tools by putting in regulations and ethical requirements? That sounds terrible. But are we somehow going to put a damper on the growth of these tools, especially in a very competitive global environment for AI and ML? So I'm curious how you would respond to that concern that folks may have about the growth of regulation in this area?

ALEX: 20:54

It's a reasonable question. I think if you end up with regulations whose net effect is to force companies to hire lawyers and check a bunch of boxes, then no one wins. Right? That's very obviously, to some of us in the policy space, not the goal. The goal is to improve the floor of quality of the AI systems. So what I mean by that is that there are a number of companies, some number, hard to tell, that are frankly not operating honestly in the AI space. They're making misleading claims about their services, sometimes even going beyond the literal capacity of existing artificial intelligence, basically treating it like it's magic when it's very much not. And they're not only exaggerating what their systems can do, some of those companies are also not putting the work in to go through meaningful introspection, right, to self evaluate for discrimination or unfair practices, which then if you're a successful company, you can amplify it a pretty large scale. And you can think about this in employment. You can think about this in the automated provisioning of health through algorithms. You can also think about this in rideshare services. And this general idea of risk applies to a lot of sectors. And I don't think the role of government is to magically fix the software industry and somehow make all of these services perfect but what it can do is it can enforce a floor of quality and honesty in the market. And I honestly think that if you're a company running ethical-- if you're taking the ethical application of AI seriously, then you stand to benefit from this. And I don't mean AI ethics blog posts, that is the current level of enforcement and it is completely useless. But if you're doing meaningful introspection about your data science applications, if you're saying, "We're going to build a product that's effective, and we're going to meaningfully check to make sure that it's fair and [to a?] reasonable standard, to the people who use it," I honestly think you stand to gain from this because it's really hard to tell the difference in the market between some of the less scrupulous systems and some of the more responsible ones. It's time consuming, and it's expensive to put the work into a really good and fair AI product. And it's frankly, it's easier and cheaper to not do that.

ALEX: 23:14

And currently, the role of law, especially nondiscrimination law, right, should be to make sure that there is sort of a minimum standard set for discriminatory practices in protected classes, in protected industries. And we're not seeing that happen, which means that the grifters can spend less money and less time making a worse product. In terms of its fairness, it might still seem to work reasonably well, and they're going to [soak?] up market share. And responsible companies are going to get punished. And there's no one enforcing this floor of standard. So at least in a local sense, at least within the United States, I think there's a clear argument for enforcing discrimination laws and maybe unfair, deceptive practices under the FTC. And I think there is a reasonable way to do that doesn't dramatically raise the cost of building an AI system and again, rewards companies that are taking the self-evaluation introspection seriously.

SUSAN: 24:16

Yeah. Absolutely. That makes a lot of sense. And I think the issue of, are we doing something that consumers will trust and consumers will feel good about, just that very basic sense of trustworthiness seems to be a great asset for companies, in the first place, so.

ALEX: 24:32

I mean, that's a great point. Right? There is a bunch of people who were fairly hostile to the use of AI systems, and it's hard to argue that they don't have a point. Some AI systems are totally fine, but it's incredibly hard to tell, as an individual, whether or not you're interacting with one that is fair to you. And fair can mean in a lot of things. It can mean if you're a woman. It can mean if you're a racial minority. It means if you can have a disability. Or even if you have an atypical facial structure, or mannerisms, or if your tone of voice is slightly unusual, or your cadence is strange, or if you have a slightly obscure vocabulary. I think people underestimate how being outside the range of expected outcomes can lead to pretty weird effects in a algorithmic system. And you might be the subject of that discrimination without really ever understanding that it's happening to you. And as a result of this, I think we do see a lot of people see stories about discrimination and problematic algorithmic outcomes and think, "Why couldn't that be happening to me?" And they're probably right. There's nothing really preventing it from happening to them. That's not to say that everyone is objectively worse off because we're using our algorithms, I don't think that's true. But I do think it's fair to say that some people are right to suspect that they are being taken for a ride, some of the time, and it's very hard for them to know the difference between when they're engaging with a responsible system, they're not a responsible system. And it's very hard for them to be reassured without some sort of different accountability system that, "These systems are treating them fairly." And I do think that is part of the reason we see a lack of trust in the AI systems.

SUSAN: 26:22

That makes a lot of sense. [music] Alex has so many great insights into the issues around algorithmic accountability, and trust, and what this means for companies doing work with AI and machine learning. But how should individual data scientists deal with these issues in their day-to-day work? Alex and I will talk more about that after we take a short break. [music]

S3: 26:51

Hey, everyone, this is Tyler Heinl, the product manager at Alteryx, working on our open-source software. Today, I want to talk about Compos, which is an open-source Python Library for automated data labeling. Using Compose, users can label their time series data sets for supervised learning. Data labeling can be a super tedious process that can require a lot of ad hoc code for each project that data scientist is working on. With Compose, we're trying to make that easy, and we remove many of the complexities associated with labeling a data set. Users will be able to label their data much faster, which will result in quicker experiment time and overall, a better machine learning model. This library works great with Featuretools, a feature engineering library in [inaudible] in the automodeling library. You can access all of these projects by visiting our GitHub page at github.com/alteryx. Further, you can see the composed documentation by visiting compose.alteryx.com. For additional updates and tutorials, follow us on Twitter, @AlteryxOSS. [music]

SUSAN: 27:56

Tyler, thanks for that info about another great open-source library from Alteryx. And now, let's get back to our conversation with Alex Engler about what individual data scientists can do to help ensure their work is trustworthy and fair. So I'd like to go back to a phrase that you mentioned earlier that I thought was really interesting in talking about companies developing algorithms and so forth, you used the phrase, meaningful introspection. And I wonder what you think that looks like on kind of the everyday level for the typical data scientist working from home these days. How can they engage in that meaningful introspection about their work or encourage that as part of their team's effort or their company's efforts? Do you have any advice for them? [laughter]

ALEX: 28:41

Super good and important question, one I'm actively trying to learn more about. There are a growing number of tools in this space, which is especially great that there are open-source libraries in Python and R about how to run bias audits and how to be a little more self-critical about the outcomes of your code and data. There's also more publicly available research about what existing issues with models you might be already using. Right? If you're building a model that sits on an existing large language model, it is very much worth reading the research about what potential biases and [inaudible] might exist in that model. And so making sure you are aware of the existing packages that might apply to your tool, right, whether that's, for instance, those bias audits we talked about, or maybe looking into the explain ability packages; we've seen a lot of interest in shapely values recently; what available tools are there that help me learn about my AI system and interrogate it a little more thoroughly, is a really good first step. I would point people towards the FAccT conference, F-A-C-C-T conference, right, which has a lot of work coming out on this. And also a lot of work and how to rigorously audit an algorithm. So there's the technical side of like, "Is my model doing what I think it's doing? And can I evaluate whether or not it has disparate impact on subgroups?"

ALEX: 30:14

Another thing you might consider doing is synthetic data generation. so you could build a fake set of data based on a circumstance you might be concerned about and run that data through your model and examine what happens. You could also create data that represents a potential problem you're worried about, use that to update your model, if you have a model that's updating automatically, and examine whether or not that leads to problematic outcomes. Right? And you could imagine doing that as part of an examination for model drift, which [your?] model might be changing overtime. So that's a set of criteria we're thinking about. And I would encourage people to look up the recent papers around how to seriously audit their own algorithm. Or there's recent papers called, Closing the AI accountability gap: defining a framework for internal algorithmic auditing. Right? So there's papers on this specific type of work. Separate from that, you also might want to consider the sort of the human data scientist role in this process. That's a little more like choosing what variables you decide to use. Right? So there's the core problem of measurement error, which is maybe less considered in the modern private sector use of data science than it is in some of the social sciences. Right? What does it mean to use employee sales or employee performance reviews as an outcome metric [in?] a prediction? Is that inherently a good and fair measurement of the thing that I'm trying to predict?

ALEX: 31:51

We're also seeing more evidence that mixed human modeling decision making can be a problem. And so I mentioned earlier, in criminal justice, you could make a model that predicts recidivism really perfectly and fairly-- well, we haven't seen a lot of examples of that, but imagine if that was a real model. You could still imagine worse outcomes if the way a judge interpreted that information was systemically flawed. And so you do also want to have some introspection around the interaction of what model you've built and how the people who see that information might react. And so there's a ton. And this is what I'm saying, this is harder. This is harder than building a bad system. And if you wonder what I think the role of government should be, it should be encouraging and then rewarding the companies that are willing to put in this work to make their own rigorous systems.

SUSAN: 32:46

That makes a lot of sense. I think one interesting thing that you're mentioning there is, I think we often see the human in the loop sort of approach suggested as a potential way of addressing algorithmic bias, but you're saying it can actually still be used to reinforce existing human biases. Am I understanding that correctly?

ALEX: 33:05

Yeah. that's an interesting question. I think it's probably hard to talk about much of AI. It's sort of hard to talk about the entire scope of AI in one breath. It is true that direct human oversight can note and account for biases in models. That's true. For instance, if I am talking to someone who has a strong accent or a stutter, and I can recognize that a transcription model that's taking their speech and turning into text is not functioning properly. Right. That is the thing that I, as an individual can do, and can note, and can say, "Oh, this isn't working, we need to do this by hand." Right? And then maybe you could imagine building that into the [model?] code to recognize when that's happening. But there's an example of where a human in the loop could identify a problem where someone's voice is well outside the coverage of the training set and thus the model is doing a very bad job of the transcription. At the same time, if you were an individual receiving a recommendation from an algorithm and you do not understand the way the algorithm is working and you do not understand the way that it is coming to decision, it is absolutely not guaranteed to help that person's decision making. They can think the model is accounting for some factors and not others. They can doubt the validity of the model in certain circumstances and not others. And you could lead to that creating just as big of a problem. And that's what we saw in some of these criminal justice algorithms in which the judge is not understanding quite what they were doing properly, led them to make systemically different decisions that I think were functionally just as bad as what might happen without an algorithm.

SUSAN: 35:03

Right. So no easy solution, you're saying? [laughter]

ALEX: 35:07

No. I wonder if maybe the, like, no general solution. And I do think there's an important lesson for governance here, which is that it's really, really hard to say anything meaningful about AI ever, right, in like a broad-sweeping statement. It's almost completely insane to try. And so what you'd hope to see in a governance structure at least, and what it seems like we're going to try and do in the US is, rather than pass legislation that sets a piece of standard for every single bit of artificial intelligence, we're going to try and work within the agencies-- and even within the federal agencies, they're going to need to work within a sector and within a use case. Right? There's a big difference between-- I've been talking about employment recently, there's a big difference between the algorithmic model that analyzes a resumé and the algorithmic model that transcribes your speech to text. And there's a big difference between those two things and the model that might predict employee characteristics. And it's not clear to me that you can set any standard or raise the floor in all three of those things at the same time. Even within one series of applications, that might require a different thinking. And so I think it's going to be tricky to set any really broad standards for AI. But I do think that a sort of smart and targeted approach where you work within industries, within applications to identify the really important algorithms that are making big decisions about people's lives, and helping raise the floor of quality in those markets, that's something government can do well.

SUSAN: 36:49

Awesome. So a couple of other things that I wanted to be sure to touch on with you in the time that we have left, one of which, I think we were in the same webinar on Wednesday, the Stanford webinar about [Digital Services Act?].

ALEX: 37:02

Yeah.

SUSAN: 37:02

I was like, "Oh, I recognize a name here in the Q&A." So I'm really interested in the international trends that we see happening. And you've alluded to this a little bit already as far as what the European Union is doing, how they're approaching this. And as we know, these are international things, we've seen the effects of, for example, GDPR affecting tech companies around the world. So I'm sure whatever the EU does in this regard is going to have wide-ranging effects as well. What do you see happening there? What are some of the key things that you think will have the greatest effects on, presumably, mostly private sector, seems to be their main area of attention, at least large companies?

ALEX: 37:42

Yeah, that's right. So the EU Commission President, Ursula von der Leyen came-- well, she didn't go anywhere, I suppose, but she gave a speech to the Council on Foreign Relations recently, and she called for a renewed transatlantic-- transatlantic being not such a great term. Right? In this case, that really means the US and EU, but probably there's many other countries attached to the Atlantic. Anyway, so US-EU relationship. And specifically mentioned technology governance and artificial intelligence, specifically a human-centric approach to AI, which is the [phrasing?] that the European Union [seems to like?]. And part of the reason that that was included prominently is, this, under the German EU Commission presidency, that's Ursula von der Leyen, they're going to release draft legislation on AI governance in the spring, sometime in the next three or four months, I'd think. And that would be legislation that categorizes some AI applications into high-risk, and then creates some set of criteria around that set of high-risk AI applications. And this does touch on the challenge I was just talking about, which is, what can you really say about all of the myriad ways in which AI can be dangerous from employment to health, to ridesharing systems, to privacy-releasing aspects, to the natural language processing models that structure and organize the web. Right? These are really different systems. And can you say anything really effectively about all of them at once? And so what that legislation says about that world of these high-risk AI applications is going to be really important. It's possible that they'll move away from this low, high risk framework to more towards what the US is doing, which is this kind of, as I mentioned, within agency approach. It's also possible that they'll set a pretty vague standard for what high-risk models have to do, and then it will be enforced in some more specific sectoral-focused way. But I do think you're right that it's what they decide to put forward in that legislation is going to be really important.

ALEX: 40:03

It's also possible that they're going to try and blacklist some things. Right? You could imagine saying it is illegal to present an artificial intelligence as though it is a human, which is actually a band that I would support. So I don't think there's any value whatsoever in pretending through either a chatbot or an automated [avatar?]--

SUSAN: 40:24

Avatar. [laughter]

ALEX: 40:25

Yeah, imagine like a Deepfake, but it's talking to you as a customer service representative, right, through something like Zoom. You could imagine saying, not disclosing that that's not a person. You could also imagine this happening in virtual reality. We have a virtual reality avatar, right, and not revealing that to a person. The issue here is that it's getting harder to tell. Right? It's hard to tell [inaudible] customer service agent and a chatbot who's following a script and between a chatbot. And that might not matter some of the time if you're buying clothes for-- if you're on a website that's helping fit you for a suit or something, but it might matter if you think you're getting financial advice. And so you could see them also ban some technologies, which would be interesting because that's the thing the US is very, very, very reticent to do, even though it's not inherently a bad idea. So a couple of things they could put forward is high-low risk AI application standards, are going to be very important. And maybe if they decide to ban things as well, that would also have a big impact.

SUSAN: 41:24

So interesting. Yeah. And I guess something to watch over the next year, is that kind of the rough timeline at this point for when some of this might actually come forth?

ALEX: 41:32

Yeah. Looks like 2021 is going to be really interesting. We've got this AI legislation coming from the European Commission. We have also seen the Digital Services Act and Digital Markets Act. Digital Services Act definitely plays a role to the extent that AI policy is a thing, which is like-- right, it's not obvious that we should even think about it as one thing. The large online platforms use algorithms, and the Digital Services Act, [which?] try and open up how those are used and make some of the data available to independent researchers. And that, if you consider that as part of this conversation, would be enormously impactful on us learning how large platforms like YouTube, like Facebook, like Instagram disseminate information.

SUSAN: 42:18

That's for sure.

ALEX: 42:19

That also looks like that discussion in the European Union is going to happen through 2021.

SUSAN: 42:24

Fascinating. It'll be so interesting to see what comes out of that.

ALEX: 42:28

Yeah. Yeah. That's an interesting model for us too. I think even if we don't know what we want to do, making large data sets a little more publicly accessible-- and I don't mean putting everyone's Facebook data on the internet, I mean giving access to pseudo-anonymous data to trusted researchers who are under a legal obligation not to spread that data around, giving them access so we can learn about the societal impacts. That's something that the European Union is really interested in doing that we should probably be giving more consideration here in the US too.

SUSAN: 42:58

That's so interesting. Gosh, we're running out of time, so much to talk about. It makes me think of Twitter just, I believe, earlier this week, deciding to open up full access to researchers in Academia in particular. So I wonder if that's a bit of a preemptive move, in a way, in their part, although it's not necessarily the full kind of access that the Digital Services Act seems like it wants to request.

ALEX: 43:22

It's a law. Twitter should be commended for this. It's a little easier for them because their site is open by default, right, so the [vast majority?] of information is public except for private accounts. And so it is relatively straightforward. Facebook put a ton of time and money into a project that tried to thread the needle where they didn't really make individual user data publicly available, but they did make large data sets available for disinformation research. And frankly, they didn't manage to quite create data sets that were valuable enough for researchers. And while there maybe we're going to learn something from it, it doesn't, unfortunately, seem like it's going to be a model for opening up their data. And they may not be capable or able to do this on their own, in part because they fear the privacy-revealing risks. If they do this on their own volition, and then a ton of that data leaks, it is a disaster for them as a company, and they could potentially be fined or be sued. In some senses, it's easier for them if a government just comes in and says, "Hey, you have to make this data available to these researchers. Those researchers are legally liable for keeping it safe. The government is requiring to do this, end of discussion." You can maybe see a better outcome and frankly, have less deliberating on Facebook side about, "Well, how do we do this, and how do we keep ourselves out of legal liability?" if they don't have a choice.

SUSAN: 44:48

Yeah. Wow. So interesting. Okay. One more quick question. I would love to hear a little bit more about something you mentioned in an article you wrote about developing a data scientific investigative capacity in government agencies. We talk a lot about data literacy and [upskilling?] at Alteryx, getting people to understand data and how data analysis works more deeply. So I thought this is a really interesting phrase. And I would love to hear a little bit more about what that means to you.

ALEX: 45:17

Yeah. Sure. So the important thing to understand is, right now we have a system where the oversight and accountability in the technology sector is created largely by a journalistic outrage cycle where journalists discover something that looks really bad or may or may not be bad, and then the public gets very mad, and the company is forced to respond in some way that could be a meaningful change or it could not be a meaningful change. And then typically, nothing else happens. This is the tech outrage cycle. It's not a very effective way to run a society. Now, for the record, journalists and academics are doing an enormous amount of good work. And they are documenting and discovering concerning anecdotes and examples around what might be structural and systemic problems in the technology sector. It's really hard to know the answer to that, though, whether or not that's actually happening, because typically they are limited by their access to information from the outside. There is an enormous information asymmetry in terms of what we know about the technology sector. And this is sort of the idea of a piece I wrote called The Devil Is in the Data for Lawfare. Essentially, the idea is that if you used to be a researcher or a journalist, you could go to the meatpacking plants and look at what's happening. You could evaluate the products coming out of that meatpacking plant. There's a famous chemist named Harvey Wiley who did this, which led to the founding of the Food and Drug Administration. And he was looking at what was coming out of these meatpacking plants and essentially saying this food and other food Americans [eat is?] really toxic. It was much more obvious how you went around and documented the problem.

ALEX: 47:12

The issue that we're seeing now is personalization, and the web really obscures this. So if you're an individual person, you can go see some small part of an algorithm, the part that [inaudible] you, but it's much, much harder to broadly see what's happening to everybody in any sort of representative way. And this problem exists in almost anything you try to study on the web. Like, what is happening to people who aren't like me? And what of those things is representative of what's going on broadly? And so despite the fact that I think journalists and academics are doing, functionally, all that they can, and they're really outperforming in their roles here, it's really hard to tell the difference between when they're right and there is something egregious and terrible happening, and when a company is being unfairly criticized by something that appears true from the outside but isn't really fundamentally true. And this is the idea-- the sort of solution to this is you enable people with meaningful independence and a sort of a societally-oriented perspective to get data access and look at this. In the example we're talking about with the Digital Services Act, that's going to be independent researchers. European Union wants to open up the big online platforms and give some of that data to independent academic researchers, and then they will say, "Hey, yes, this is a problem or this isn't the problem. We have a systemic and comprehensive way to evaluate it." And that's really valuable because it helps clear up this, like, "What should we be mad about and where should our focus be?" as well as other things like, "Who should we be fighting for not handling issues like child pornography or like terrorist content?"

ALEX: 48:51

Now, when I talk about data scientific capacity in the federal government, there's another side of this. So one is, how can we open up some of this data so we really know what's going on? And then, two, is, what happens if we have a really good reason to suspect that there is a law being broken? Right now, it's harder. Right? You can't go get all the documents in a filing cabinet anymore, and then read them, you need some new skills. Right? It's a little less lawyerly and a little more scientific. And so to do this, you need agencies that can subpoena data sets. And there's a gray area on when you can do that. You might be able to subpoena models. And there's a gray area on when you can do that. And some of this has been going through the courts a little bit. So you need to be able to go get the data from a company that you suspect of wrongdoing, and then you need to be able to analyze it securely and effectively. Right? It's definitely a problem every time the federal government subpoenas a company, that data is very easy to hack, right, so you need real cybersecurity systems and effective and safe cloud environments, probably, for the government to analyze the data. And then you need talent. We need to be able to hire data scientists who're interested in this type of investigatory work. And that's what I mean by data scientific capacity. Right? Can we get the data through a subpoena? Can we securely hold the data? And then can we analyze it in order to provide some sort of oversight? And it turns out that these aren't really, really hard problems to solve, but you have to be sort of focused on not very exciting things. Right? It's not quite as exciting as some of the AI things that make news, it's improving federal hiring processes and the availability of cloud infrastructure to agencies, right, in a [inaudible] way. And that's probably where a lot of the immediate good can happen and, hopefully, we're going to see the Biden administration invest some attention in that. And I think that the number of good technologists on their transition team is encouraging in that regard.

SUSAN: 50:50

So interesting. It's a whole nother career path for our aspiring data scientists out there. Something else to think about.

ALEX: 50:56

For sure. The Office of Personnel Management just approved an unofficial title of data scientist in the federal workforce. And they started hiring--

SUSAN: 51:04

Oh, yeah, didn't exist before?

ALEX: 51:06

It did not exist. It did not exist. And this was an issue, right, because you're like, "I am a data scientist, and I don't mind taking a pay cut," and the government's like, "We want to call you Programmer Analyst III," and you're like, "Oh, that's not helpful." It shouldn't matter as much as it does. But I've had a lot of students who work in data-- who are data scientist in public policy, and it's slightly frustrating when they are going to get called a statistical programmer when they know that there is a lot of market value and value to their future in not being called a data scientist. And they know they have the skills too. So it's one step, but it's an important step to draw on technical talent.

SUSAN: 51:43

Yeah. Absolutely. Wow. So interesting and so much yet to happen. Is there anything else that we haven't talked about that you want to be sure that we mention or anything else going on?

ALEX: 51:53

This field, [this?] idea of how we build meaningful governance around artificial intelligence is not set in stone. And I think there's a number of reasonable people who want to improve and shape the world in which we use AI in our society. And the people and the companies are really interested in finding the right balance on how can we ensure that there is a standard set, ethical and responsible practices broadly used in this field. Those should be really welcome voices in the conversation. And I hope the people listening to this feel like they know a little bit more about how to get engaged and potentially how to contribute to this, because it's going be important. I think if you imagine, like I do, that the scope of what algorithms do and the scale at which they are used is going to dramatically increase over the next 100 years, it's really going to-- the way we govern them, and the way society plays a role, and the public plays a role. And their use is going to be really, really important. And this is going to be a big couple of years in setting maybe the foundations of what that governance, what that role of the public looks like. And so I really encourage the people who are passionate about this to find a way to get involved.

SUSAN: 53:05

Yeah. Absolutely. [music] Well, Alex, you gave us a great overview of this topic and a lot of really important deep insights here. So we're very grateful for you doing that. And it's been great to talk with you.

ALEX: 53:17

Sure. I'd just say that I am @AlexCEngler on Twitter, and I would love anyone interested in this sort of thing to feel encouraged to reach out. I'd be happy to talk more about it.

SUSAN: 53:30

[music] Thanks for listening to our Data Science Mixer chat with Alex Engler. Join us on the Alteryx community for this week's cocktail conversation where we want to know your thoughts. Alex told us he wasn't super excited about some of the things he had to learn early in his data science studies, at least not until he found ways he could apply what he learned to the issues he cared about and connected with a community of like-minded people. What's kept you going in data science, even when the going was rough? What's inspired you to continue to learn in the face of challenges? Share your thoughts and experiences by leaving a comment directly on the episode page at community.alteryx.com/podcast or chime in on social media using the hashtag, #datasciencemixer and @Alteryx. Cheers. [music]

 


This episode of Data Science Mixer was produced by Susan Currie Sivek (@SusanCS) and Maddie Johannsen (@MaddieJ).
Special thanks to Ian Stonehouse for the theme music track, and @TaraM  for our album artwork.