Data Science Mixer

Tune in for data science and cocktails.
MaddieJ
Alteryx Community Team
Alteryx Community Team

Join us behind the scenes of the trucking industry to learn how data science keeps the supply chain moving.

 

 


Panelists

 


Topics

 


Cocktail Conversation

 

Have you ever taken a field trip to a specific place to inform your data science work or to better understand your industry?

 

Join the conversation by commenting below!

 

Cynthia CC.png

 


Transcript

 

Episode Transcription

SUSAN: 00:00

One of my favorite things is learning about something that was previously just part of my everyday landscape, some thing that you see so often that it just feels routine, but then you have the chance to learn about a whole deeper level than that thing. And you never look at that ordinary, everyday thing the same way again. Welcome to Data Science Mixer, the podcast that features top experts in lively and informative conversations that will change the way you do data science. I'm Susan Currie Sivek, senior data science journalist for the Alteryx Community. So seeing things in a new way, that's totally what happened for me after this conversation with Cynthia O'Rourke. And in this case, it's trucks. Trucks are everywhere, moving all the things. But I have a whole new appreciation for what it takes to make that system work and especially for the role of data science in it after talking with Cynthia. Let's meet Cynthia and hear more about this fascinating area of data science. [music]

SUSAN: 01:10

Cynthia, thank you so much for joining me today for Data Science Mixer. I'm so glad to have you here. Maybe we could just start off by having you share with us your name and your current title, the company you work at, and the pronouns that you use.

CYNTHIA: 01:23

Sure. I'm Cynthia O'Rourke. My current title is a DSII at DAT Freight & Analytics, and my pronouns are she, her.

SUSAN: 01:31

Awesome, thank you. So tell us a little bit about how you got into data science and what brought you to DAT.

CYNTHIA: 01:38

So in college, I got a biology degree because I always wanted to go outside and hang out with the animals and the plants, and everyone who has a biology degree knows that that is not what you get to do with a biology degree.

SUSAN: 01:47

[inaudible].

CYNTHIA: 01:48

And I went into grad school. Yeah. No, you don't. You write a lot of grants. So I went into grad school for a doctorate out of undergrad and ended up with a large pile of behavioral data. I was doing a doctorate in evolutionary behavior, so the intersection of how natural selection and behavioral pressures come together to evolve these strategies, the behavioral strategies in specifically mating systems and fish. So I took to statistics pretty quickly over the course of the biology degree. And when I ended up with this intentionally giant, but also a complex pile of data relative to the other datasets in my lab, I couldn't approach it with our standard tools. So our standard tools would be like t-tests and ANOVAs and principal components analysis, and that just wasn't going to cut it for all the different nested variables that I had for this dataset. So I needed to do what was at the time considered cutting-edge, confusing modeling in my department, and that's generalized linear mixed modeling. And I went to our statistics department and they were like, "We have heard of it." And I went to our ecology department, which was the really statsy corner of the life sciences, and they were like, "We too have heard of it." But no one on campus was able to to help me with it. So I really got the opportunity to dig into over the course of the year R work, a lot of modeling and scripting and trying to get papers through publication with these techniques that I had to largely self-teach along with everything Ben Bolker has written on the internet. He's kind of the guru of this modeling, and he is just a gem. His comments will lead you through the process. And so I learned a lot of R and I learned a lot of clustering and a lot of generalized linear modeling, which for the record, generalized linear modeling, that's basically a logistic regression with mixed effects slapped on top. So when we think of logit, that kind of like baseline data science technique, that's a binomial generalized linear model. So it took me like two months before I woke up at like 4:00 AM and was like, "Logistic regression is just a generalized linear model. I've been doing this for years." It's the terminology is different. But the work that I was doing was really similar to what a data scientist does. I didn't know that data science existed at the time, though. So I continued with biology and I got into teaching and I moved out to Portland, Oregon, and went into behavioral genomics so that I could look at the genes underlying behaviors and not just the evolutionary processes. And then eventually, I was actually dating a guy in tech, and he mentioned that I would make a good data scientist. And I said, "What is the data scientist? I love this combination of words because I love data and I am definitely a scientist. Tell me more."

SUSAN: 04:31

Sounds like destiny.

CYNTHIA: 04:32

Right. And so a data scientist turns out gets to be the person who is just handed giant volumes of data. Now it's not necessarily in great condition, but giant volumes of data, and you're supposed to make something useful out of it for a company. And I was like, "That sounds amazing." So I set my sights on that summer of 2018, and then I was hired on at DAT Freight & Analytics about six months into 2019.

SUSAN: 05:02

So it's that easy.

CYNTHIA: 05:03

It's that easy. You just hear about data science, and then the next thing you know you are one. That's how it goes.

SUSAN: 05:07

So yes, I like this. This is an amazing career path and and career destination. Just hear about data science and just start doing it. And it's amazing how that [inaudible] come into a really neat role for you at DAT. So tell us a little bit about some of the work that you do.

CYNTHIA: 05:23

So I do a lot of ad hocs, which I think any working industry data scientist will tell you that a lot of your work is going to be ad hocs. You're in a position, especially in a medium to large company, where you get to know the data really well. And so when there's not a question that it's appropriate for the financial analyst or a specific product success analyst, when it's more general, they often go to the data science team. And so we around the team trade off a fair number of ad hoc analytic requests. So there's a lot of my time that's spent doing correlation heat maps, stuff like that, dealing with federal data, especially messy federal data has been a huge part of my life and my sequel journey. So you do a lot of that, and I actually I really like those because they give me a chance to learn about our customers and our marketplace and how our company functions in ways that I wouldn't necessarily have learned otherwise, because you have to go digging. So it's like a little research project. And then the rest of the time is spent on productionizing, turning records, which is what I think of as data into revenue for the company, which means we are selling it to someone internal or external. I generally work on externally facing products. And so the process of hitting some kind of algorithmic success metric that serves a customer success metric, which is, I think probably you've heard from a bunch of other people, very little of that is actually the algorithms. A ton of that is figuring out who the customer is, talking with the stakeholders, talking with the domain experts, getting comfortable with the data, what it can and can't do and trying to figure out what success looks like and how to measure it along several different axes. So I'm in a lot of meetings, in a lot of meetings.

SUSAN: 07:11

Yeah, I bet.

CYNTHIA: 07:13

But it's really fun because I get to learn about freight trucking, and that is a wildly complex ecosystem.

SUSAN: 07:21

Yeah, yeah. And I like how you call it an ecosystem.

CYNTHIA: 07:24

Yeah. Well, and I'm biased because I come out of an ecological background, behavioral ecology. So I'm really comfortable with these heterogeneous spaces where nothing is independent. Things kind of cycle back onto themselves and you can't assume that any two things are exactly like one another. Everything's a little bit different and the data is incredibly messy, and that is absolutely true. So we have we have three basic customer types in my company. We have truckers. So people you see out on the road driving trucks, but also the companies behind them. So the people back in their offices that are dispatching loads for those trucks to carry, taking care of the insurance requirements, the federal bond requirements, there's whole businesses that essentially carry freight. So that's one, the truckers. And then we have the shippers and those are the people that have freight to move. So this is kind of interesting because they don't think of themselves as shippers. Walmart doesn't think of itself as a shipper. But for us, they're a shipper because they have things to move. And then the third is this intermediary partner called brokers, and a broker is they just they mediate the transaction between the shipper and the trucker. So they never actually hold the load, but they make that handshake happen. And so we have these three customer bases and that alone would be a fairly complex set of demographics. But then think about all the people in the country that have things to ship. If you've ever mailed a package, you are technically a shipper. So you and I are shippers. But then so is Kraft Foods.

SUSAN: 08:48

Yeah. A little different scale. Yeah.

CYNTHIA: 08:52

Yeah, yeah. And then if you think of truckers, my uncle Bill back in Michigan was a owner-operator for a long time and he owned his truck. He was leased on to a company and he drove his truck. So he is a trucker, that guy with one truck. On the other hand, Amazon is also a carrier, as we would call it. So they are a trucker, they have huge fleet of trucks. So there's that wide range there. And then brokers, the same thing. Some brokers are just little operations of three or four people and some are C.H. Robinson, which is four times larger than the next largest [burg?] in the country.

SUSAN: 09:24

Yeah, amazing.

CYNTHIA: 09:25

Yeah. So it's this wonderfully diverse set of customers. And then we have this-- we have this amazing set of data. Well, I say set like it's one set of data, but our company goes back 42 years now, 43 years now. So we were originally a corkboard on a North Portland truck stop, Jubitz Truck Stop.

SUSAN: 09:44

I love this story.

CYNTHIA: 09:46

Yeah. No, I've actually I've been to this truck stop. I have not seen the corkboard, but I've been to this truck stop. And so people who had a load, "I've got a load. I need to get this load of whatever's out of Portland, Oregon, this Saturday, and I need to get it to New York City by the end of the week." And they would stick that load on an index card on the corkboard and truckers like my Uncle Bill, who would bring his truck to Portland and bike around town and then be ready to take something else like, "Well, I'm heading to the East Coast, I could take your load." So Uncle Bill would look at that index card and he would call the number and he would negotiate to carry that load back for a certain rate. And that's called a spot rate. And that is a rate that has not been agreed on long in advance. It was just an index card on a corkboard and we handle those. We are still that kind of a system only we are now handling just shy of a million loads per business day on our load board, which is very digital now.

SUSAN: 10:36

That would be a lot of index cards.

CYNTHIA: 10:38

Yeah. So the Jubitz brothers built out that index card situation, the first just monitors in truck stops around the country. So you could look up at the monitor because the index cards were getting stolen. Humans are humans. So over time, it turned into this massive database and this massive number of transactions every day, which we then can collect signals from that. For example, the number of loads that are posted gives you an indication of the demand for trucks in the country rather. And that can change by region to region. So if Portland. Oregon currently has a ton of loads posted. We know that if you're in a trucker, you're in a really good bargaining position to ask for a higher rate to take one of those loads out of Portland, Oregon. And conversely, if there's not very many loads posted or if a bunch of trucks have posted because we also let trucks post themselves to our load board, then you know as a trucker that you are not in a good bargaining position. So that's some of the data that we have. We also have rates data. So we track over time how much it costs to move a load around the country. And that is really important data for a lot of people who are calculating what they should be asking for to move a load or what they should be paying to move a load. And that's called our rate view database. We have a bunch of other data on top of that. We are filthy rich in data and it is a delight if you love working with data, which I do. It's just an embarrassment of riches.

SUSAN: 12:00

That's awesome.

CYNTHIA: 12:01

One of our newer DSs, Andrew Sandall, refers to it like being a kid in a candy store, something like that.

SUSAN: 12:07

That's great. And you mentioned the federal data as well. So how is that working into your process to the degree you can talk about that?

CYNTHIA: 12:14

The federal data is awesome in this country. What our tax dollars do for us in terms of generating and storing data is amazing. So you could just sort through different data sets at the federal level all day and never get to the end of it, never find out how many different feeds there are out there. Fortunately, I only have to look at the macroeconomic and the freight and trucking data.  

CYNTHIA: 13:52

And it requires some cleaning. It's a big data set and there's a lot of human-entered data, and especially the smaller truckers cannot necessarily be trusted to be careful on what they enter. But once you've got that cleaning logic, you get this insight into what the fleet sizes are. So how many of my uncle Bill, how many owner-operators are out there driving right now? How many of the Amazons are out there right now? Of those two groups of people, where are the drivers? Right now, we have a driver shortage in the country. So which companies are succeeding in hiring a bunch of drivers? Which companies are losing their drivers to other company sizes? There's a ton of data in there. And so, yeah, I just once a month, Snowflake is populated with that data. We've set up a Python script that pulls it off the internet, unzips it, and throws it to Snowflake. We have a great data engineering team, and then I've got a little Python notebook that just runs and automatically aggregates it all, and we can all just take a glance easily at what's happening in the fish population or the trucker population this month.

SUSAN: 15:09

Very cool.

CYNTHIA: 15:10

That's actually if I were starting off in data science-- a lot of people do [inaudible] competitions and they look at very similar data sets. The federal data trove is incredible. There's so many things out there and it's not really well-annotated. But if you just dig-- the Bureau of Labor Statistics and sometimes these time series go back for decades. So go to the FRED, the F-R-E-D, its economic database that'll just let you plot data in real time. And then you can pull down the CSVs if you find it interesting. That's where I would go if I were just starting off on data science right now to get some interesting data sets. Also, your tax dollars pay for it, so. Might as well use what you're paying for, right?

SUSAN: 15:53

Yeah, that's cool. That FRED website is very cool. I have looked at that before, and you can even do with seasonality adjustments or get seasonality adjusted data sets, all sorts of stuff. It's pretty neat.

CYNTHIA: 16:04

Or take the seasonality out if you want to do your own seasonality adjustments.

SUSAN: 16:07

Right, right. Yeah. Yeah, nice. Very cool. So I think you had mentioned, too, a recommender system that you're working on. Is that something that you can tell us a bit about?

CYNTHIA: 16:18

Yeah. So we don't just have access to that federal data. We also have a long history of a lot of those federal data stores. So not all of that federal data is maintained indefinitely by the agencies that report it, but we capture that. So we capture a lot of that and we store it in stuff like. And so we have this problem for shippers and for brokers, the people who handle the transactions between truckers and shippers, where you need to keep a pool of reliable truckers that you can reach out to on short notice. So if you have a load to move this Saturday, you need to be able to get in touch with someone this Saturday who has a truck that's in the right position and that you could trust who has the right insurances to move that load. So you have to keep this kind of constantly refreshed Rolodex of contacts of carriers. And our company has about 100,000 carrier customers, which is a good sized number. And we also have some data that I'm not going to speak to specifically, but we have some data that is great for training a model on the preferences that brokers have for carrier customers. So, yeah, so who would you like to do business with? And more importantly for me as a data scientist, what attributes are informative of who you want to do business with? So of all these different ways we could describe a trucker - the fleet size, the kind of equipment they have, the geographies they run in - which of these matters to you? And so I did some machine learning work, the actual machine learning work to pull out what's most informative signal wise. So what do they actually care about? What are the things that are most predictive of whether or not this person is going to be in their network? And then I worked downstream and I've simplified it into a Euclidean distance scoring model with some extra jujus on it so that we can host it very, very quickly in AWS. So you can run it, you can literally cycle through several million carriers and dozens of attributes every day and get lists of 200 for every carriers. So this enables us then to turn around and offer it to our customers if you need a specific carrier that is like this carrier, here's a list in ranked order of carriers that we can recommend that are like that carrier. And if you give us a list of carriers you like, we can take a second pass at that and give you a list based recommendations. So you seem to like this kind of carrier on top of the specific carrier. So we're going to tell you a list of 200 of this kind of carrier to reach out to. So it's almost like a networking recommender like LinkedIn does.

SUSAN: 18:49

Yeah, yeah, that's very cool. And this is something that is still in the works or something you already deployed or?

CYNTHIA: 18:56

Still a work in progress as far as getting it out to the customers. But it's embedded in a series of features which I'm not allowed to talk about, which--

SUSAN: 19:02

Sure, sure, sure.

CYNTHIA: 19:03

--are currently in beta customers hands right now. So be out there shortly. It steals the basic structure of the LinkedIn first pass, second pass network expanding thing. They had a really good idea and I love their algorithm, so I thought I'd crib their notes.

SUSAN: 19:21

Yeah, yeah. Well, I just love the creativity, though, in taking that to trucking. I mean, as somebody with no knowledge of the industry, that's not something that would have occurred to me, but I think that's such a neat application of that concept.

CYNTHIA: 19:34

I mean, when you're doing it-- I figure one of the first steps when you are developing a data science product and you have gotten this far is to know what you're trying to deliver. So that's actually pretty far into the process. But once you've talked with the stakeholders and you know what you're trying to deliver, one of the first steps is to go on the internet and see which generous company has told you how they've done something similar. So Stitch Fix is really good about doing this. So I'll do a full tour through their algorithm stack. LinkedIn was very open on especially the engineering and technical end of what they do. There's just some companies out there that are willing to gift you with what has worked for them, and you can go touch their product and decide if you like that output. And then you can steal that structure. So why not?

SUSAN: 20:17

Yeah, absolutely. Absolutely. Very cool. You mentioned some of the issues that you've faced with some of the messy data that comes up, even handwritten data and so forth. And it sounds like this is something you strangely enjoy. But what are some of the strategies that you've used to deal with that, especially some of the stuff that's just maybe inconsistent or untrustworthy?

CYNTHIA: 20:40

Well, so the report that I handed off this morning, there is one line in the Python that is just, "exclude these three people," not these guys. So it's like we're looking at millions and millions of truckers in the United States and not these three, though. And they weren't added all at once. You do something, you'd process the data. The data would do something really strange. You dig into it and you'd be like, "Oh, Joe in New Jersey with a single tow truck. You do not have a fleet of 10,000 trucks. Joe, I am going to remove you from the database going forward." And so there's a lot of that. We had actually this morning, we have someone that we're going to have to add a note in about because they slammed our system and it looks like they posted too many loads. So we're going to have to clean them out. So we have these measurements of how many loads are posted and then suddenly you get this giant spike and the spike isn't real. And so we have to then post-process and remove that person and whatever it is they were doing on that side. So there's a lot of that. But freight and logistics in general, it's a long-standing industry, you know what I mean? The railroads expanded this country. So it is, it's foundational to the United States, which means that there is a lot of tradition and there are lots of the ways that we've done things. And there's a lot of data that's on carbon copies or written on a hand scrawled bills. Actually, the pandemic was good about getting people to move to eDocs. So that's kind of nice. There's a lot of people's backend systems don't talk to other people's backend systems. They don't speak the same language, they're on the same formats. When customers want to submit data to us, we have to figure out how to clean it up and make it play with everyone else's data when we post our rate stuff. So yeah, it's this big, heterogeneous data ecology as well as a user and problem ecology, and that requires a lot of not making assumptions. Not making assumptions and carefully [EDAing?] not just your input data, but also the output you get and thinking about why that output might look a certain way and whether that matches up with reality. I call it sniff testing, and I will often send it to other people in the company and be like, "Does that look like it-- am I in the right ballpark at least? What's happening here?" So there's a lot of-- it's just a legacy field. There's a lot of money coming into it right now, freight and trucking and logistics. There's been a lot of eyeballs on us over the last two years because of COVID. Everyone's like, "Toilet paper, food, housing, materials, what's happening?" And we're like, "We know. We know what's happening." So there's been a lot of interest, but even before that, there was a lot of venture capital flowing into the space because there's so much space for technological optimization, for pulling inefficiencies out, for making things faster, for making people be able to do their jobs more rapidly with less tedium, for increasing analytics and data insight into the field. It is so much low hanging fruit and giant bodies are very messy data which, if you can crack open, are just rich, just fantastic. So, for example, the Koch brothers, last year I believe it was, funded a startup that is going to automate the unloading of trucks in docks. So in the delivery docks via-- what do they call those things? Little pallet lifters.

SUSAN: 24:05

Oh, forklifts.

CYNTHIA: 24:05

It's like a forklift. Yeah.

SUSAN: 24:07

Yeah, yeah.

CYNTHIA: 24:07

So automated forklifts. So everyone's talking about automated long haul trucking and automated cars, and the Koch brothers are thinking automated forklifts to unlock the trucks because the trucks, when they get stuck in those loading docks, that can be 3 to 10 hours out of a trucker's day and they are regulated. They can only drive for certain periods of time so that can take them off the road for a giant chunk of their week where they can't drive for that chunk. They just lose those hours as far as the federal government is concerned. They're not allowed to use them to drive. So if you can make loading docks more efficient, you can effectively put more trucks on the road. You can increase the capacity of the freight systems in the United States.

SUSAN: 24:41

Interesting.

CYNTHIA: 24:42

Tons of venture capital, tons of low hanging fruit, really messy data. A fair amount of Baby Duck Syndrome, too. A lot of people who are like, "We have done this this way forever. Please don't make me learn something new. I'm very busy."

SUSAN: 24:54

Okay, Baby Duck Syndrome, I don't think I've heard that phrase. You'll have to tell me that one.

CYNTHIA: 24:59

So you know when Google changes something about its interface and you're like, "I hate this," and then you forget about it a day later?

SUSAN: 25:05

Oh, yeah. Yep, yep, yep.

CYNTHIA: 25:07

Yeah. So it's actually [Tinbergen?] I think studied the ducks and little baby ducks will imprint on the first thing they see and they'll imprint on-- and I don't think this is-- yeah, so they'll follow the mama duck. But if the first thing they see is a boot, an ecologist boot--

SUSAN: 25:22

Oh, no.

CYNTHIA: 25:22

--then they would follow that boot. So it's not that the boot is the right thing to follow, it's that they've imprinted on it and they have Baby Duck Syndrome.

SUSAN: 25:29

Oh, okay. All right. That makes sense. Oh man, that's too funny. I love it. I love how we've gotten into fish mating and baby ducks imprinting carrots. We've heard a lot of territory. [laughter]

CYNTHIA: 25:42

Good stuff.

SUSAN: 25:42

Well, I wanted to ask you a question that we always ask on the podcast. We call this the alternative hypothesis. So what's something that people think is true about data science or about being a data scientist that you have found in your experience to be false?

CYNTHIA: 26:00

So much, so much, so much. But I think my favorite thing because I love being wrong, especially about something that I'm really insistent that I'm right about, I heard of AutoML. So automated machine learning. And I was like, "That is the stupidest thing I've ever heard of. Are you kidding me? This is a bad idea." So extreme negative reactions, sneering all that stuff. And then like a year later at DAT Freight &/ Analytics, I got a chance to sit in on a demo from an AutoML platform. And once I understood what they were trying to do, the things they were and were not trying to automate, I was blown out of the water because the things they're trying to automate-- and is this is where my own ego comes in, "Oh, you can't automate my job." Our job as data scientists is to automate and take the tedium out of other people's work, to make sure that you don't have to sit down and try and draw your own forecast of a time series every single day. We're going to automate that. We're going to make that to make sure you don't have to process your own PDFs all day, every day. That's literally my job. And I was like, "Oh, no one could do that to my job," but--

SUSAN: 27:08

I'm not a forklift.

CYNTHIA: 27:10

Exactly. But that's what they're doing. I mean, they're automating out the process of data cleaning, imputation, sample balancing, checking the model against 39 other different models, different pipelines to use, whether or not to use any kind of differencing in the pipelines. Just everything that is-- it's like a hyper grid search. So if you enjoy rolling your own grid searches and implementing a grid search with every single Python script you run and like doing that, I mean, sure. But if you don't enjoy that work, if what you want to do is focus on the data and choosing the best model structure to get the data to go where it needs to go and all the stuff that goes around the stuff, the productionizing aspect, getting it out there to the customers, checking to see whether you're answering their pain points, checking to see whether it fits into the business strategy, all that other stuff is what I'd rather focus on. And so to have someone automate out what I think of as the pretty tedious and straightforward-- not straightforward, they've got Kaggle experts doing this. So let me not even begin to imply that I could do it better. But what it feels like to me, it feels like the first CPAs who got to see Excel working, you would have looked at that and been like, "This is amazing. There's so much of my rote daily behavior that I don't have to do anymore." And now you would be extremely hard pressed to find a CPA that doesn't use Excel, right?

SUSAN: 28:32

Yeah, yeah.

CYNTHIA: 28:33

I don't know if one exists. And I think that's where we're headed with AutoML. It's not that it's going to get rid of data scientists. It's going to be like data science on steroids. So that has been really exciting to see. And to be totally wrong about, so wrong about.

SUSAN: 28:47

That's great.

CYNTHIA: 28:48

Yeah, I mean, I still dive into the data because it's messy, right?

SUSAN: 28:51

Oh, yeah. Sure.

CYNTHIA: 28:52

And you want to understand. You're going to get a better model if it's not garbage in. So you want to understand how to clean it up, how to tidy it up, how to make it sane before you feed it into the model. And for me, understanding what the data is doing and how it is feeding into the model, what you are and are not getting signal out of is crucial for making the models better. Part of me it's like an academic, I understand what's happening in the ecosystem. But part of me wants to know if this is really informative for the model, I want to know that it's really informative and how, because we may have another data table out there with some related information that it would make sense to add into this model and maybe make it a lot better. So that explainability thing that these things speed up, knowing which of these things reliably pulls out, which of these attributes, which of the variables reliably pulls out signal from a number of different models will tell you kind of how the world works a little bit, and that's useful for building a better model.

SUSAN: 29:49

Absolutely. Yeah.

CYNTHIA: 29:50

It's interesting. It's just really fascinating. So, yeah, but I definitely do not want to do a grid search to see all the different ways that I could lag a time series. I would like someone else to set that up for me automatically.

SUSAN: 30:02

There you go. Awesome. So things we haven't talked about yet that you want to be sure to discuss, something I should have asked about that I didn't?

CYNTHIA: 30:11

Yeah, well, I think something interesting about working at DAT, so we're freight and analytics. People hear about us and they're like, "Oh, you're in trucking." We're actually an entire building full of nerds. It's just a giant pile of software nerds. We're software as a service. And so it's always weird to me to go-- I mean, people in town will be like, "DAT, you guys do software." But outside of town, people will be like, "Oh, trucking, what's that like?"

SUSAN: 30:41

They think you're hanging out at the Jubitz Truck Stop in North Portland?

CYNTHIA: 30:44

Yeah, exactly. Which I do want to go. I want to go to Jubitz and I want to buy the truckers beers and ask them questions, absolutely. But I've never driven a truck. I drive a Toyota Corolla. Come on. We are true blue nerds. We are a company full of product people and engineers. I don't know, Apple or Google, Facebook, Airbnb, we're honestly in that space, but we have decades and decades of proprietary data to work with. And it's just, I don't know. The remark is that I really lucked out. I took an interview with a trucking company and I was like, "I've looked at the things you are requesting someone do, and I'm pretty sure I could do that." And I showed up and it was just this wonderland of nerds and data.

SUSAN: 31:29

Yeah, that's awesome. That's awesome. I mean, keeping an open mind to the different opportunities that are out there and different industries. I mean, that's one of the cool things about data science to me is that every industry needs it. And so--

CYNTHIA: 31:41

You can apply it to anything.

SUSAN: 31:43

Yeah, yeah.

CYNTHIA: 31:45

Yeah. I scraped the-- I shouldn't say this. Well, they can pump me or not. Indeed, I don't resell their data. So it's legal, but I scrape Indeed's hiring numbers, [inaudible] number for different things, one of which being data science. And there has been over the course of the pandemic nothing but a steady increased desire for data scientists. And I think it makes sense. A lot of people are focused on data science as big data and [inaudible], all that stuff. But there are so many companies with growing bodies of small data that they could extract real revenue out of, really make that data useful to them. So I'm psyched to be in the field when it's growing in this way, applications data scientist, people who are just in there, too, like a plumber, in there to do the job. And when AutoML tools are coming online to make the stuff easier for the average person to use, an analyst could do data science with AutoML tools and AutoML platforms for DevOps. That's exciting.

SUSAN: 32:43

Yeah. Yeah, absolutely. Very cool. Yeah, that's actually something that we didn't touch on. We didn't really talk about the effects of the pandemic on your work. Is that something that has-- I mean, we don't have to talk about it if it's [inaudible].

CYNTHIA: 32:57

It's touched us a little bit. Yeah. Early days we were on Slack and our director was meeting with FEMA once a week. So that was really exciting.

SUSAN: 33:04

Oh, wow. Yeah, that's wild.

CYNTHIA: 33:06

Yeah, because I mean, they wanted to know whether people were going to have food in the grocery stores. There were a bunch of big open questions in the early lockdown of how are we going to move freight across the country when freight is driven by human beings that can catch this disease and a lot of them are from the baby boom generation. Truckers were worried that they would get sick and their dog who travels with them in the cab of their truck was going to have nowhere to go because they were going to be in the strange town.

SUSAN: 33:31

Oh, oh.

CYNTHIA: 33:33

Right? It was crazy. It was a very-- and then the rates have-- the spot rates, so the rates that we track have shut off the chart. They--

SUSAN: 33:43

Wow.

CYNTHIA: 33:44

Trucking is as a cyclical thing, so we have these long periods where there's too many trucks and then we have long periods where a bunch of trucks leave the market and now there's too much freight. And in the times when there's more freight than trucks, the rates go high, the rates to ship stuff go high. And then that draws a bunch trucks back into the market. And so then you have too many trucks and not enough freight and the rates go low. It's these long-term macro economic cycles and the one that we are in right now is unprecedented because of all these imbalances, freight coming in from weird parts of the world and manufacturing being shut down and then turned back on and all the networks are off balance. And so spot rates are just off the charts and have been for quite some time.

SUSAN: 34:26

Wow.

CYNTHIA: 34:26

So yeah, our world has been very rocked by the pandemic, like everyone else's world, but. It's been really very interesting to watch. Yeah.

SUSAN: 34:36

For sure. Well, and with access to the kind of data that you have, I mean, I'm sure that's been super interesting to to observe.

CYNTHIA: 34:42

Super interesting. Yeah, high frequency, real time data. Yeah. We can see what's happening in the economy on a daily and even hourly situation. And that's really fun.

SUSAN: 34:52

Yeah, for sure. Cool. Anything else?

CYNTHIA: 34:57

No. My dog has been growling in his sleep for about 15 minutes now, so that's cool.

SUSAN: 35:05

Aw, that's the--

CYNTHIA: 35:06

Would you like to know what I'm drinking?

SUSAN: 35:07

Oh, yes, yes, yes. We forgot. Oh my goodness, I've been so distracted by my stupid internet that I forgot. So do you have anything special that you're enjoying there with you today?

CYNTHIA: 35:16

I do. I have the most insufferable Portland answer for this? I love it. My pandemic hobby was to start to roast my own coffee.

SUSAN: 35:23

Oh, fun. Nice.

CYNTHIA: 35:24

And I still don't know if I'm very good at it or not. I mean, my friends are all polite, so who knows? And my dad's not going to tell me it's bad. But I did recently, I went on a business trip and I came back and I had run out of my hand roasted coffee and had to use the stuff that I was using before, which is unlocally roasted coffee. It's fine. And what I have discovered is that I don't know whether my coffee is good, but I hate what I was drinking before now.

SUSAN: 35:48

Wow, how interesting.

CYNTHIA: 35:50

So I am drinking a dark roasted Brazilian peaberry from Serena. The region is Serena.

SUSAN: 35:58

Very nice.

CYNTHIA: 35:59

Yeah, that is my insufferable Portland answer, yeah, what are you drinking?

SUSAN: 36:03

It reminds me of the name of the chicken episode in Portlandia.

CYNTHIA: 36:07

Yes. Yes, it is very much the name of the chicken. [laughter]

SUSAN: 36:12

Yeah, when you know the town, that's like next level right there. The place that we recently moved to has not very good tasting drinking water. So my husband has been trying, I think, every single LaCroix flavor and we are now on--

CYNTHIA: 36:26

I love game.

SUSAN: 36:26

Guava Sao Paulo. So yes.

CYNTHIA: 36:30

What's your favorite LaCroix flavor?

SUSAN: 36:31

Probably just lime. I like stronger flavors. Yeah, this one's not bad. This is the first one I've had out of this case, so we are going through it, though. You should produce some increase shipments of LaCroix to your [inaudible] good at this, right?

CYNTHIA: 36:48

Get in there. See what our commodities note say.

SUSAN: 36:53

Thanks for listening to our Data Science Mixer chat with Cynthia O'Rourke. Join us on the Alteryx Community for this week's Cocktail Conversation to share your thoughts. Here's our conversation starter for this week. Cynthia and I talked about the truck stop where her company's origin story began. Have you ever taken a field trip to a specific place to inform your data science work or to better understand your industry? Tell us about it and share how it changed your work. Share your thoughts and ideas by leaving a comment directly on the episode page at Community.Alteryx.com/podcast or post on social media with the hashtag DataScienceMixer and tag Alteryx. Cheers. [music]


 

This episode of Data Science Mixer was produced by Susan Currie Sivek (@SusanCS) and Maddie Johannsen (@MaddieJ).
Special thanks to Ian Stonehouse for the theme music track, and @TaraM  for our album artwork.