Alter Everything

A podcast about data science and analytics culture.
Episode Guide

Interested in a specific topic or guest? Check out the guide for a list of all our episodes!

VIEW NOW
AlteryxMatt
Moderator
Moderator

As cyber threats and data breaches become more sophisticated, so must our defenses against them. Breaches in our cyber security don’t just impact businesses; they impact everyday people who interact with these businesses. We talk with Brian Vallelunga, Founder and CEO of Doppler, about the devastating impacts of data breaches on individuals and organizations, and practical strategies to protect your digital secrets.

 

 


Panelists


Topics

 

Ep 161 (YT thumb).png

 

Transcription

Episode Transcription

Ep 161 Data Governance and Data Breaches

[00:00:00] Megan Dibble: Welcome to Alter Everything, a podcast about data science and analytics culture. I'm Megan Bowers, and today I am talking with Brian Vallelunga, one of the founders of the Secrets Management Company, Doppler. In this episode, we chat about his journey, starting the company, the why and how of secrets management, data governance, best practices, and more.

Let's get started.

Hey Brian, it's great to have you on our show today. Could you give a quick introduction to yourself for our listeners? 

[00:00:33] Brian Vallelunga: Yeah. So great to be here. I'm Brian, one of the founders of Doppler. Doppler is in short, a secrets manager. We help developers and engineering teams manage their API Keys database, URLs, and other secrets that basically are like the keys and the locks to their data.

So da, it's right on the topic of data governance. 

[00:00:52] Megan Dibble: Awesome. I'd love to start off with just your founding story for Doppler, how you started it, what made you wanna start that company. 

[00:01:01] Brian Vallelunga: Origin story is a little bit of a bit of twists and turns. I've always been an entrepreneur at heart. Doppler is startup either seven or eight.

It's a little hard account at this point. A lot of failures before. This is the one that worked. But yeah, it, it all started when I was at Uber. I was on the safety team and while I was there I was working on a side project. And the side project was a, a crypto machine learning marketplace, kinda all the buzzwords in one.

Uh, for real, unintentional, but that's, yeah, that's how it was. And this was a very hard project to get off the ground. It kind of felt like pushing up boulder up a hill. Like you move one foot forward and you slip five feet back from exhaustion. It was just really hard. And at some point I was just kind of frustrated and I decided to take a trip to, to Mexico to, to take a break from it all.

And the kind of like rule or deal that I made with myself was, I'm just gonna take a break and not think about it at all. And I broke that promise the first day I was there, and I just realized that. Maybe it was me, maybe it was the market, but I just was never gonna be able to get this thing off the ground.

It was like about eight months at that point that I'd been working on it. And it just, it, it didn't feel like I was making any, any real tangible progress. I've always been inspired by other founders, kinda like how other people really track like basketball players or baseball players and their shoe percentages and all that stuff.

I do that with founders and one founder in particular I really love is a Stewart Butterfield and he is the founder of Slack and I think he's like the best in the industry at. Failing upwards. For example, he creates a video game and that video game fails with born out of that was Flicker and then he goes and he tries again.

He is, I'm gonna make a video game that works this time and that fails. But born out of that was Slack. And so kinda like on that same notion in that moment, I was like, okay, what could I learn from this process? And what, what things have I been struggling with that maybe others are too? How can I fail upwards and managing API keys database?

URLs configuration was something I just struggled with a lot, but throughout the entire project, and I struggled with it in a lot of different ways. And so after coming back from Mexico, I go to this dinner that Stripe was hosting that has about 50 to 60 other founders and developers. I basically asked them, is the world broken or am I a shitty developer because I just can't tell anymore.

And, and to my surprise about, I don't know, 60, 70% of them, maybe a little bit lower, said that they had the same problems, managing secrets as I did. And I was like, oh, okay. So I'm, it's not just me. And what was so interesting is the more research I did, I found that like developers are struggling with this mid-market.

Companies are struggling with this, startups are struggling in enterprise, like everyone was struggling across the board because it really wasn't a tool for developers or people that just had to access data a lot. It was really just designed for like these security teams. So that kind of inspired me and I built the first version of the product in three weeks.

The fourth week we did something called Chipotle sales where we would. I would try to basically convince anyone that was a developer to come get lunch with me at Chipotle. I'd buy them anything on the menu and in exchange I got a rant at them for two hours. We got our first couple customers that way.

[00:03:56] Megan Dibble: Wow. 

Take notes. Take notes, everybody that's that's creative. 

[00:04:00] Brian Vallelunga: Do things that don't scale. Go, go talk with your customers in person or potential customers. And then from there we, we got into YC Y Combinator and the rest was kinda history. We raised from Sequoia right after that and did our seed round. And then I think a year and a half later we raised from Google.

And then a year later we raised from CRV. And now we service, I think well over 50,000 companies. So that's been like, uh, the bridged version of the founding story. 

[00:04:27] Megan Dibble: Yeah. Very cool. You mentioned secrets management, being at the heart of this. We have a variety of listeners on this podcast from different backgrounds, so some may or may not be familiar, so could you give a overview of what that encompasses?

[00:04:43] Brian Vallelunga: So I think the first thing to talk about is like what is a secret, right? And a secret is kinda like a fancy way of saying it's a password, but it's a password for other software systems. So kinda like how you have like a password to Netflix. Your code needs to talk to a database, then it needs a password too.

And that password is commonly referred to as a secrets. The big difference between human passwords and these kinda like machine passwords, which you call secrets, is that with a human password, it grants you access to an individual account. Like my password to Netflix only grants me access to my Netflix account versus a machine password grants you access to everyone's Netflix account in this case or the entire database.

And so they are really, really. Sensitive secrets or pieces of information because they're the keys that unlock the entire digital kingdom. So if you're like Airbnb or you're Stripe or any other service that has a database or does payment processing or does email or text messaging, any of those services that they rely on, like a database has a a password that their code needs to use.

And Secrets management is a practice of protecting those machine passwords, those secrets. And if you think about why that's important. At the end day, I think most people use like 50 to a hundred different services, right? I use Uber to get around. I use Instacart for delivery, for groceries. I use Venmo to send payments to friends, and all those companies have an immense amount of data on us, and I find it a little bit frustrating sometimes when companies say, oh, we have petabytes of data or terabytes of data.

I'm like, yeah, but. What is that data? It's not just some abstract thing, it's real people's data and companies are charged with protecting that data, their customer's data. And the way to do that is to protect the keys to that data. And that's, that's the, the practice of secrets Manage is protecting the keys.

So we keep private data private, 

[00:06:31] Megan Dibble: definitely. That's super important as we use more and more services. I can't imagine when you mentioned the key that unlocks all of Netflix or the key that unlocks all of Airbnb, the consequences of that not being protected correctly or governed correctly. I can imagine that that could result in a lot of fallout.

[00:06:51] Brian Vallelunga: I have a real 

story that I can tell, like how it gets scarier and scarier. And part of the reason for me building Doppler is just I'm tired of having my data in a data breach. Like it's just happened way too many times. It happened with Equifax, it happened with Twitch, happened with a bunch of others and this actually just happened a couple weeks ago, so it's a pretty recent story.

I was with my mom and I get this call outta nowhere and it's someone claiming to be at the Texas Customs and Borders, and they're like, we have some package in your name. That has illegal drugs and money in it, and we're investigating you. This is real. And I was like, okay, you have my attention now. And they're like, we're gonna give you a couple pieces of information to verify that we are who we say we are and who and who you are.

And they had all this data on me. They were like, you've lived in these last five locations, these are your three email addresses. They just knew so much about me from all this data that had been in a data breach. And about 20 minutes in, I figured out it was a scam call. During that time, they had me legitimately scared and they also got a couple more pieces of information on me that I, that they didn't have before because they asked me questions and I answered them thinking they were the real authorities.

And I'm sure that data can now be used against me in a future attack. And so that's like the cost of a data breach when companies, I guess, are negligent, not protecting their secrets. Is that real? People like me, you and everyone else listening here pays the price when their private data gets out, it's real and it happens a lot.

[00:08:14] Megan Dibble: Yeah, it's definitely scary. And to think that it can get more and more convincing as technology improves, as AI improves. Yeah. If they had you for a bit, you know, and you work in this industry, that's definitely a large threat out there. 

[00:08:29] Brian Vallelunga: Oh yeah. It was only really until we got our lawyers on the phone with them, because that's the next obvious move when you're being investigated.

Mm-Hmm. That the lawyers figured it out and that's when we stopped the call. Yeah. But like they really legitimately had me fooled for a bit and now I know, but like I can imagine someone who's not even in the industry. And thinks a good password is like 1, 2, 3 A, B, C or something like that could really get fooled and give up a lot of information that could be used.

And like a lot of this information are security questions that you use to reset your, your password in a bank account, right? So like all your money could be drawn. This is not something like arbitrary risk. It's all your money could be drawn out or like you could have credit risk where like someone gets a loan for a house in your name because they can answer all the security questions.

There's real, real risk here. 

[00:09:11] Megan Dibble: So that just reinforces the importance of secrets management in your company, so that's super interesting. I, I do wanna shift into data governance and talking about the lessons that you've learned about data governance throughout your career. We have a lot of data professionals that listen in on this podcast, so I'm wondering what they should keep in mind when designing data governance strategies.

[00:09:34] Brian Vallelunga: Yeah, that's a great question. I think the first one that's really interesting to me is around access controls. A lot of times I see with companies is they have a set of credentials for different groups of permissions. Like, here's the credentials for viewing, here's the credentials for read and write.

Here's the super admin credentials, and then they distribute these credentials to all the people that need access to this data versus saying, each individual user gets access to the database. The reason why I think it should be more identity based of like this individual user's authorizing to this data set instead of this macro permission is then you can get far more granular.

Maybe I don't need access to the entire production data set, I just need access to one or two tables in the database. I think there's like a big, big problem today of over provisioning. The kind of flip side of that is the least provisioned access, right? And like I think the industry should orientate around that and it's try to do a good job of that, but there's some tooling gaps there to really make that accessible to companies.

And I think if you. If you're a professional in charge of managing data governance, I'm sure this is music to your ears, if you need more narrowly scope permissions, a scope to the identity of the user, and then also because it's scope to the identity of the user, you get some nice byproducts of that, like really robust logging and auditing.

So even if something does happen, you can go back and be like, these are the exact cells in rows they looked at, this is how they changed the data. This is what they did with that data. I also think that in a data governance world, it's super important that. You're not just governing the data in the database, but you're governing it throughout the entire internal tools and platform, right?

You need to really observe how data moves from the database to the application layer and everything in between, because at any single point in that chain is where data could get leaked. Then you have a problem. So it's not just about protecting the database, it's about protecting the entire stack all the way through.

And so you need the permissions to follow all the way through, which means now you're implementing permissions at the database layer. You're implementing permissions at the application layer, at the infrastructure layer. And so I, I think that holistic approach to data governance is really needed in the industry instead of these like spot solutions.

[00:11:36] Megan Dibble: That makes a lot of sense. And I think having some previous guests on that are more in the accounting industry in the audit profession. That seems to be common practice there to have the controls and you can go back and see the audit of checks and balances of who has access to what, but do you think that that's not as common in industry at large outside of accounting from what you see?

[00:11:58] Brian Vallelunga: I don't think it is. When I was at Uber and a company before that, they just were like, here's a production database or, or I mean, in Uber's case, thousands of databases. But like you could just go readily query it. And I'm sure they did have auditing, but it didn't feel like narrowly scope permissions there.

It felt like almost a free for all. There was a lot of trust and I'm sure some data was tokenized so that I couldn't see it or redacted, but there was a lot of data. I think I was like way over permission for what I needed. So I, I think in the financial space it makes a ton of sense and there's regulation around it that can forces it.

But in other non-regulated industries or industries that just have to deal with just SOC two compliance or iso, there's no real strong governance body around that. I think the most that they have in SOC two is make sure you have a password on your database and make sure you can see the logs from the database.

And that's it. 

[00:12:43] Megan Dibble: It's interesting too because like at Alteryx, at our company. We are all about democratizing analytics, but we're also about governance. So there's definitely a trade off there in terms of you want the analytics for all, but you also wanna make sure that for certain data sets with protected information, that there's varying levels of access.

So I think that's something that people are always trying to balance and it might be hard to find the right balance, but I don't know if you have any thoughts on that Trade off. 

[00:13:13] Brian Vallelunga: I think whatever the scope of the job is, they should have the right data to do that by default to be able to perform that job.

But there may be questions at times that they want to ask that are like not rigidly inside the scope of their job or wasn't accounted for in the scope, like the scope of their job increased, for example. And in that case, I think that's where like break glass functionality, which for anyone who doesn't know break glasses, you can basically request access to that data.

Someone can then approve it and then you break the glass of for a time period. So like you could get access to that data set for maybe. A month or a couple days or a couple hours, whatever the time window is. That makes sense. And then you lose access again. And obviously everything you do with that data is stays within the system so it's not leaving the system.

So you have full audibility throughout the process. And I think that makes sense. Like I think that's a good middle ground. And because now there's some review process and if they did get access and they shouldn't have gotten access, you also know who is accountable for that. It was the person who approved it.

So there's accountability up and down the chain. 

[00:14:08] Megan Dibble: That makes a lot of sense. I like that as a intermediate solution for that tension between analytics for all and maybe not access for all. So that's really great. My next question, I think we've touched on this a little bit so far already, but how does secrets management impact all of us?

[00:14:26] Brian Vallelunga: When I hear all of us, I really hear like the everyday person. Me, the you, the everyone else listening on this call, and I think it's the biggest insight I can share is that we trust companies more and more every single day with their data, and sometimes we don't even realize we're giving them data.

Uber's a great example of this. They asked you for your name and a password and an email and maybe a phone number too, right? So a little bit of data, but nothing more than any other services, pretty much asked for. Then all of a sudden you start taking rides everywhere, right? So now you take this ride to like your partner's house or like to a restaurant or someplace, maybe even someplace you shouldn't have gone to.

And they have all that data too. Every time you use their service, they have data on you. And that data, if got out, could be quite damaging. They could expose where you live, where you go to work every day, or what you do every day, or even like your friends or your family. Their addresses too. You could also breach your friends by accident, just by you getting breached.

And I think that's really where secrets management just impacts us all is because all that data that we're talking about is getting protected by a password, and that password needs to be kept secure. And when that password isn't kept secure and accident gets leaked into the public, then. It's not just one person paying the price, it's everyone for that.

That password basically unlocks data to, so if you have a data set of a million people and that one password gets out, then that's a million people that are now getting exposed or or breached. And that's really how secret management impacts us all. And I really do think it's negligence. When companies don't implement this, there's one thing of not knowing, but the second you have a security team, they should know.

The second you have a DevOps team, they should know and they should be aware of it. And if you're not doing anything, you just have it on files. They're typically called N files. You have it in M files or you have it siloed in a W Secrets manager, but you're not caring about all the secrets, like the staging secrets and local development secrets.

I still think that's negligence and that's a recipe for disaster. And so it's that. That's how I think it impacts us all. 

[00:16:15] Megan Dibble: What do you think about the rise of AI and how does that intersect with your focus on security and secrets management? 

[00:16:23] Brian Vallelunga: Yeah. I think it's quite scary, to be honest. Hot take maybe. I think one thing important for anyone to understand is that I.

Offense and defense are completely different. Games and offensive players have a, have a very strong strategic advantage. So, lemme paint a picture. You're a defender, right? You're in charge of data governance, you're in charge of security at the company. You, you are a protective force of the company. Your job is to build the castle walls as high as possible to make sure intruders can't get in.

And you gotta make sure every part of that wall is as strong as it possibly be. There's no cracks in it. There's nothing. It's as impenetrable as possibly can be, so you have to cover all the surface area of attack. An attacker has the exact opposite job. Their job is to find the one crack you missed. Put TNT in it and below the wall down.

They don't need to to take out every single wall. They just need to find the one weakest point in the chain and exploit that because if they can do that, usually there's a very big domino effect that ripples across the industry. Great example of that is like Microsoft Microsoft's email server gets hacked by Russia, or I guess Russian state actors, and now Microsoft has Russian state actors in their systems.

Even outside of email. The email was the entry point to all the other critical systems, which is quite scary, and so the hackers only need to find the weakest link. AI is just gonna make that so much easier. AI is gonna be able to find all those weak links that we didn't think about because at the end day, it's humans that are building the walls up.

Humans aren't perfect, and AI doesn't need to be perfect. It just needs to try a lot of things in a lot of creative ways really, really fast. And that's exactly where skillset is at. 

[00:17:51] Megan Dibble: Yeah, that's what I was gonna say. That's what AI is good at, is trying, that's a bunch of things. 

[00:17:57] Brian Vallelunga: And also coming in with like really crazy solutions to things like, Mm-Hmm.

I think photo generation's a good example though. Let's put a donkey on the moon and it can figure out how to make that image look good. That's like a creative outlet, that's a iteration problem. And I think AI's gonna do a really good job at being a great offender. I really, I think play defense is a different game, and I do think that there's gonna be more and more AI tools for defense too, but it's just so much harder.

It's like orders of magnitude harder to play defense than offense. 

[00:18:25] Megan Dibble: Yeah, that makes sense. So, I mean, obviously there's a call to action here for companies defending their data and investing in secrets management and everything. Do you have any takeaways for individuals who are maybe worried about their personal data or things like that?

Yeah. 

[00:18:42] Brian Vallelunga: So I have takeaways for both companies or professionals who are charged protecting data and also for just the individual person. I'll start with the individual person. Use a password manager, like one password. I would use others. Uh, I think one password's the best industry, so much so that like they have a competitive product, my product, and I think it's legitimately that good.

The other thing I would say is make sure that every site or service you use has a different password and that password is generated by the password manager. If you have the same A, B, C password across all the services you use, and then one service gets hacked. Now all the services are essentially hacked because that email is gonna be the same across all the services.

And so really what you wanna do is you want to have one password that you know, and that's the password to the one password application. And the one password application is generating all the other passwords for you. That will dramatically increase your security posture right there. The next thing you should do is you should use, it's called two FA two-Factor authentication.

Now I have some strong opinions in this. I think you should not use text messaging two FA. So like by default, banks always wanna use text messaging two FA, and that is by far the easiest one to, to break and hack. It's like commonly considered in the security industry, not that secure at all. What you really want to use is you want to use the authenticator app.

It's where they'll show you a QR code. You'll scan the QR code by the password manager, like one password, and then it'll generate a new code every 30 seconds. That's the thing you really want to use. That will make things so much more secure. So if you have an option of SMS two FA, or they'll say OTP or QR code two FA.

Use a QR code one. Then the last trick that's actually pretty sneaky is a lot of services ask you for security questions like, what's your mother's maiden name and stuff like that. You do not actually have to put your mother's maiden name in that answer. You can put whatever you want in that answer. And so what I do is I generate other passwords for that too.

So that is like a 30 character long password that makes no freaking sense. It's not my mother's maiden name. It's like some random string of text. Anytime you see a security question, put that as another password in the password manager. 

[00:20:49] Megan Dibble: Very interesting. I've never thought of that. That's a great tip. 

[00:20:52] Brian Vallelunga: So that's my tips to individuals and then to businesses.

Really, I think there's three questions you should ask yourself. And again, my job here is not to pitch Doppler and for you to use Doppler. I just wanna stop having data breaches. So let's figure this out. As an industry, you could use our competitor and I'd be happy. The three questions I think you should ask yourself.

Where are all my secrets? Where are all the passwords? The API keys, the database URLs and the certificates, the encryption keys? Where do they all exist? Are they on developers laptops? Are they in production infrastructure? Are they in code? You should have a very strong, well reasoned answer that you can mathematically say, I have 2000 secrets, and this is exactly where all of them live.

Because if you don't know where your secrets are and how many you have, you literally cannot protect them. You gotta know what to protect first. The second question is, you need to be able to answer who has access to these secrets. You can't answer that question too. You can't really protect them. And the third question is, can I stop an attack?

Can I stop a data breach from happening? So let's just say somehow hackers get access to your secrets for whatever reason. Are you just gonna sit there and let them keep it attacking you? Or can you actually stop the attack somehow? And if you can't answer those three questions confidently, you have a problem.

You don't have to use LER to solve it. I'd like it if you did, but you don't have to. But do use something. Use a secret manager. There's tons of good secret managers out there. 

[00:22:15] Megan Dibble: Awesome. Well, yeah, I really appreciate you coming on our show today. I know I personally learned some things I. There's a few passwords I might need to update if we're being honest.

But yeah, I appreciate you sharing about Data Governance Secrets Management, and thanks for coming on the show. 

[00:22:32] Brian Vallelunga: Yeah, it was a pleasure. Thank you all for, uh, for listening to me ran for a little bit. It was fun. 

[00:22:37] Megan Dibble: Thanks for listening. To learn more about topics mentioned in this episode, head over to our show notes on community.alteryx.com/podcast.

See you next time.


This episode was produced by Megan Dibble (@MeganDibble), Mike Cusic (@mikecusic), and Matt Rotundo (@AlteryxMatt). Special thanks to @andyuttley for the theme music track, and @mikecusic for our album artwork.