Data Science Mixer

Tune in for data science and cocktails.
MaddieJ
Alteryx Community Team
Alteryx Community Team

Crucial in the allocation of federal funding, US census data is a valuable national asset. In partnership with Reveal Global Consulting, we learn how the US Census Bureau has used satellite imagery analysis to optimize labor intensive manual survey and data processing. 

 

 


Panelists

 


Topics

 


Cocktail Conversation

 

What's your favorite example of a previously manual, time consuming, awful, tedious process that you've used your data powers to automate or streamline? Share your success story!

 

Join the conversation by commenting below!

 

Mixer LI.png


Transcript

 

 

Episode Transcription

SUSAN: 00:00

I'm a sucker for cool gadgets and space stuff and for data science, of course. Plus I love efficiency and streamlining boring and complicated tests. A friend once described me as always one productivity app away from achieving nirvana. So when I heard about how the US Census was now using satellite photos, image classification and Alteryx to automate the gathering and analysis of data used for construction surveys and metrics of economic growth, I was like, "Tell me more." [music] This is such an interesting example of combining different data sources and automating data collection analysis and reporting. And it's got a dash of creativity that will inspire you too. Welcome to Data Science Mixer, a podcast featuring top experts in lively and informative conversations that will change the way you do data science. I'm Susan Currie Sivek, senior data science journalist for the Alteryx Community. I'm so excited to bring you in today's show, not just one, but three other awesome data people who will tell us about this project. First, let's meet Andy. [music]

ANDY: 01:10

Andy MacIsaac. I am the solutions marketing director here for the public sector at Alteryx. Him and he. I'm located here now in Massachusetts, and I love working with the public sector, which includes working with some great partners like Reveal Global Consulting, who we're going to hear from today.

SUSAN: 01:27

From Reveal Global Consulting, I also want you to meet Hector Ferronato. By the way, be sure to stay tuned throughout this episode to hear Hector's story. It is honestly one of the most amazing personal journeys into a data science career that I have ever heard. You won't want to miss it.

HECTOR: 01:43

It's a pleasure to be here, guys. My name is Hector Ferronato. I'm the director of technology at Reveal. We've been working with Alteryx in many projects together and it's a pleasure to come in here and talk about some of the solutions that, together with Alteryx, have made our lives and our clients lives a lot easier. So my background is in computer science and economics, but have become more of a data scientist and really fell in love with machine learning this past recent year. So it's great to be here.

SUSAN: 02:11

Before we get into my conversation with Andy and Hector, where we'll dive into the mechanics of this awesome satellite imagery project, let's get some background. [music] During the Alteryx Inspire 2021 conference, Andy chatted with Stephanie Studds, chief of the Economic Indicators Division of the US Census Bureau. She told the conference audience about what her division does and about how the construction surveys used to work and how data science and automation have made a huge difference in that process.

S4: 02:42

In the morning, when you pick up your favorite headline from your favorite news source or you're reading tweets on Twitter, many people see information related to the international trade program to the deficit. New home sales, retail sales or manufacturing data, where US stands in manufacturing goods. We are the most rapid, reliable source of data produced by the federal government sector on these topics. These principal federal economic indicators provide real time measures of the US economy. These indicators come out weekly, monthly or quarterly here at the US Census Bureau.

SUSAN: 03:20

The indicators combine a number of different data sources, for example, construction information and building permits. All of that information comes from different agencies and could be in very different formats. The old process of pulling together all those data sounds super tedious. Stephanie explains.

S4: 03:38

Previous to the Reveal team joining us, our data was actually collected on mailed questionnaires, mailing packages put together to building permit offices all over the country. More recently, we did move to some electronic reporting, but it's still very heavily being keyed into systems here reviewed manually. Pretty much every form that a respondent sends back to us has manual components to it. If the data appears to be out of line, then we pick up the phone and we call those individuals to confirm the data. And then on top of that, we use highly expensive field staff to drive to our construction sites and collect data. The extensive cost of all this collection that I just spoke about, paper questionnaires being mailed, humans keying in those questionnaires, has truly limited the size of our samples for all of our construction surveys and thus limited the quality and the granularity of the data we can publish here at the Census Bureau.

SUSAN: 04:37

I don't know about you, but that process sounds pretty exhausting to me. Stephanie's team discovered there were other sources of third-party data that might provide some of the same insights more quickly and comprehensively if they could find the right approach and the partner to help build the right analytic process.

S4: 04:54

The construction sector was rich with alternative data, which included satellite imagery. The availability of that data, once we started getting into it and seeing the capabilities of that data truly has and can revolutionize the way we do our construction indicator programs. We are on very tight deadlines on a monthly indicator program. So we only have a few days of the month to do collection of that data. So it was could those vendors meet our exceedingly tight deadlines to get the data to Reveal and to the Census Bureau and actually make it work for the programs? And then we really needed skill sets to really understand this data. We knew the environment was rich and the possibilities of success were there. We also wanted a state of the art methodology. New math methodologies behind our algorithms, our predictive modeling and then our technology as well.

SUSAN: 05:49

So spoiler, the Census project has been a success. Stephanie told us that the Bureau is looking at ways to expand this approach and their partnership with Reveal to other areas, too, like the impact of natural disasters and the effects of pandemic aid on the economy. But what happened to make it work well? Let's get back to Andy and Hector now to find out more about how Reveal worked with the Census to develop an innovative time and money saving data science solution. But this is data science mixer, so we have to talk drinks first. Here's Andy's beverage of choice.

ANDY: 06:23

I have fallen in love with diet Snapple raspberry ice tea. It's my drink of the summer. Given that the time recording this, my other drink would be what I would call a Kentucky mule. But it's still early in the afternoon, so that would be a mule with a bullet bourbon. But so we'll stick to the iced tea right now.

HECTOR: 06:43

That sounds pretty good, actually.

SUSAN: 06:45

Yeah, for sure. For sure.

HECTOR: 06:47

I'm enjoying my chai, actually. So coffee guy, not much a tea guy. So I'm trying to switch a little bit recently. But again, as Andy, if this was a little later, I'll probably be enjoying some beer, some Brazilian beer.

SUSAN: 07:02

Oh, yeah. Nice. What's your favorite Brazilian beer?

HECTOR: 07:04

It's called Brahma. Brahma. So that's like a lager, like a light lager.

SUSAN: 07:08

Nice. Sounds good. Yeah. Ironically, we wanted to have the whole cocktail theme for the podcast, and I think we've only done one recording that was late enough in the day where everybody felt comfortable actually having alcohol. So usually we're all having teas or coffees or something. I'm having a raspberry mojito herbal tea. Unfortunately, it's not actually a mojito, but we'll do what we can at 10 o'clock in the morning here in Pacific Time. Now that we're hydrated, let's hear from Hector about how his team and the Census Bureau went about changing up data collection for this formerly tedious survey process.

HECTOR: 07:41

There's such a large amount of data to be collected and analyzed and processed that many who work in that area would usually decrease the speed of the processing or the scale of it. If you think about some jurisdictions that are not required to report some of this data, then we have to rely on statistics to infer what those data points are coming from there. So how do we overcome those issues that we automate the collection of that data directly from the source? So for some indicators, it makes sense that you can see from satellite images what is going on down in the bottom. So by the use of machine learning and deep learning models, you can analyze those images and come up to similar data points to what you'd be collecting from the surveys. So in that way, we're leveraging the power of analytics, not just from collecting and processing the data, but going further into creating the data straight from the ground truth, what we call it, the ground truth that you see on those images.

SUSAN: 08:42

Yeah. It's actual literal ground truth, in this case, it sounds like, with the construction imagery.

ANDY: 08:47

Yeah. We're seeing in government specifically is that there's been such an increase in the availability of third-party geospatial data, satellite imagery and so forth. And I know in this particular case, the satellite imagery came from Airbus, but that's third-party commercially available data. But now the government doesn't necessarily have to create that data. They can now collect that data and utilize more efficient third-party sources to create a repository of geospatial images that the Census is able to leverage to really accelerate their analytic process. I think that's absolutely profound.

SUSAN: 09:25

Yeah, it's incredibly cool. That was one of the things that I thought was so interesting about this project and that I wanted to dive into a little bit more was just the way that you're bringing together all of these disparate data sources. It sounds like you're still using some of those traditional economic indicators, traditional data sources, but then also using the third-party data as Andy is describing. So what were some of the challenges in doing that and pulling together these different data sources? How did you go about tackling that, Hector?

HECTOR: 09:51

Yes, I think that's probably one of the biggest challenges on projects such as these. If you don't have good quality of data, not just the quality, but the number, the diversity of the data and the frequency, your overall results won't be as good. So I just described five different areas that you need to be looking at in your data points to make sure that your end result has quality. So for us, with this construction indicators for engineering, there's so many data points that we put together and there will always include either your previous reference data point, which is Census in this case. There's a transition period where you have to compare both versions of the indicators that you're creating and then only when you have some certain threshold confidence is when you can start dropping, for example, the more manual processes. But you always use that reference point, in terms of that ground truth that you're trying to optimize. So it's almost like we're creating these transforms and getting the data from one point, applying all those rules and recipes and models, and then what comes out as a result is something very close or identical to what Census requires. So all that process in the middle that I just described from collecting, analyzing and processing, we leverage Alteryx a lot because you have those native tools that we can drag and drop. We can actually now instead of having a couple of really expert data scientists that are experts on their own languages and their own frameworks of analysis, we can now empower even business analysts, data analysts to do higher level of work and more advanced work almost to that level of expertise by leveraging drag and drop capabilities and still keeping in use your native Python, your native R code that comes in from your data scientists that are more experts in a way. So that has helped us tremendously achieve this effort.

SUSAN: 11:39

Yeah, absolutely. And I think if I understood it correctly when I was looking at your workflow, it looked like your spatial analysis was using both data sets that are part of Alteryx, that can be included with Alteryx and other data that you were pulling in. So what was it like integrating those different sources of spatial data in particular?

HECTOR: 11:57

Yes, yes. So we have some of those coordinates, the more raw, geospatial data that comes in with, let's say, your images. Your images contain your latitude and longitude points, and you can convert those into objects, into your polygons in Alteryx so that you can use those spatial tools there, spatial matches and creating the intersection layers and object radius, all of that we leverage it from Alteryx. Once we pull into Alteryx there, you create that structure that you need. So it's really interesting how you're able to match, like you said, joining these data points that are completely unrelated, almost, except for, let's say, it's the same location. So because we know it's the same location or center point, we can kind of stack these data points as a sandwich. I talked about the sandwich before, so you can stack them exactly to that level. You're enhancing, you're creating layers of data. So you have your image, then you have some of the property information and property boundaries. Then we add another layer, which is the cropping of those images to each property, and then each of those croppies is what the model classifies. And then you get those results back, that would be your final layer, identify what's happening at each of those property boundaries. So it's amazing the complexity of how we push this and we're really thankful that Alteryx relies and allows us to actually connect, let's say, to our models via API. So you don't need to run really intensive GPU tasks using the Alteryx workflow. You can call it via API and get those results back, just like as if you were running on that workflow itself.

SUSAN: 13:28

Nice. And now with the image processing portion of the workflow, I assume a lot of that was the custom code that you're referring to, that the data science folks had developed separately. Can we talk a little bit about that? Because it seems to me, at least from the outside here looking in, that working with those images, potentially somewhat fuzzy satellite images of buildings in progress and so forth, that that could be a pretty challenging image classification task. Could you talk a little bit about how that developed and how you were able to tackle that successfully?

HECTOR: 14:01

Absolutely. Absolutely. So a lot of these machine learning models are getting more and more advanced in terms of how well they can match our human eyes in terms of describing and either categorizing or classifying or localizing an object, identifying it. So we're leveraging those capabilities by training models and with machine learning and data analysis, a lot of your models, the quality of it really comes down from how you collect the data and organize the data. This use case required us to learn a lot about image classification and learn the different types of models that perform that. So that's where our data scientist really shined in identifying and prototyping these models and building from the ground up. So you start with a simpler question in a way. I can't really describe what you're asking on those questions, but you can think in the same level that you design your first model to answer some of these questions and then you make them more complicated in terms of what they're describing. But again, it really comes down to your data collection. So if you have really good data collection and you only know that after you've had the data, you trained the model and you test the model, and then you have to go back and say, "Okay, actually I need to collect more data or I need to collect different data." So it's a cycle that it's ever going. So even with this model we have now, we know that it's only going to improve over time because we'll see more data, we'll see more scenarios, more outliers scenarios, what we call them, because that's really where the model learns, by seeing more and more possibilities and learning from them.

SUSAN: 15:30

Absolutely. And so for the image analysis portion here, is there any other detail you can provide on that?

HECTOR: 15:39

Yes, yeah. We're using CNNs. So I think that's as far as I can describe. So we're using some CNNs to take in those images and learn some of the indicators. And the way we phrase them is to match exactly what Census needs. So everything is really well orchestrated because if you work in isolation, then you might be building something that really doesn't match. So I would say the secret of the sauce is the whole process really together. It's not just on the model because you need all that information you collected and all the process to know how you're defining the model, how you're defining that data collection. And the whole process might be repeating itself 10 times until you get it correctly. So it's really interesting to see. And we're also thankful for Census for being innovative, really, and their team is amazing. It's not in vain that you have-- the US has such a great US Sensus agencies, is the talent that the team is there and the effort that they put to create these data points that are so important for our economy.

ANDY: 16:35

Yeah. And I know in talking to Stephanie at US Census earlier, she really talked about how important it was to find the right partner and have that right partnership. And I think this is a perfect example of an agency that wants to be innovative, wants to really think outside the box, and then working together with the folks at Reveal and leveraging Alteryx to put together a really strong partnership to tackle the problem together.

SUSAN: 17:01

Yeah, yeah. Very much so. Yeah. And Hector, I love your point too, about the secret sauce is kind of the entire process, that willingness to be able to continue to refine. I think I get the feeling for a lot of folks who are starting out in data science that doing Kaggle projects and things like that, it's like, "Well, I get the data. I come up with the model, I submitted. It's done." But obviously, that's not the case. There's this constant cyclical process that you're very much engaged in.

HECTOR: 17:26

That's right. That's right. It's one thing for you to train your model or any type of model and you see those results from your training and they seem really exciting. And then you actually go in and get data from the real world, from production or semi-production and that's where things start to change because that's where you're actually testing your model outside of what that you trained it on, even similar images that you trained it on. So there's definitely that approach. And I think there's also when we're learning data science, we sometimes underestimate the little details that come out before you even touch the data or train. We always talked about this in an object-oriented programming as well for computer science, where it's not clichéd that you actually have to just go down and think about before you program because you need to see the whole architecture together and think that through, because otherwise you're two or three steps deep into a process and then you figure, "Oh, actually, no. I have to go back and change something. It's not the right format." So those are the kind of delays that you don't want happening. So you really want to think of the whole architecture. The small assumptions you make in the beginning when you define a problem and defining your problem really is something that you spend a lot of time on as well. Because you can go-- you can solve something so many ways. So how you define your problem will almost dictate your success on it, really, because depending on how you define it, you're not going to be able to solve it. But if you define it in a different way, you might be able to solve it even quicker than you had before.

SUSAN: 18:55

That's great. Yeah. So what kind of happened at the end of this process? So we've talked a little bit about looking at the images, integrating the data. What does that look like then on the other end of the process when you've got predictions for these various images, you're integrating that back with all of your other data, coming up with estimates of the construction activity? What happens then?

HECTOR: 19:18

That's a great point. And at that moment, really, we need to integrate it back to the Census environment and that way we report it to them every month, these numbers on an aggregated manner. So depending on the locations that we're watching and tracking, that's how we aggregate the results back from the classifications. And one really important step is the manual processing of checking. So even though we're automating a lot of this process, there will always be some validation involved. And a lot of times that comes from us who created the model. So it's interesting because you can outsource the labeling of some of these images, but if you don't have the expertise related to your specific problem, a lot of times the quality of the labels that will come to you won't match what you want. So you need to get your hands dirty. So we see this in our team with our senior data scientists and a lot of our analysts come together and we, ourselves, actually select some samples from the results and manually check that. It's really, really important because that's the only way you can guarantee and correct along the way of things that you need to improve.

SUSAN: 20:23

Yeah, that's the less glamorous side of data science, the cleaning and then the checking afterward. Absolutely. And then I assume--

HECTOR: 20:31

And analyzing 100 images for a week straight, will get you seeing construction or seeing images everywhere. You're walking on a supermarket and you're like, "Oh, this looks like a construction." Sorry.

SUSAN: 20:43

That's awesome. Just becomes part of your everyday vision there. That's so funny.

HECTOR: 20:48

Yeah.

SUSAN: 20:52

[music] Let's take a quick break. When we come back, we'll hear about the positive impact this new approach has had on the US Census. [music] Hey, data science mixer listeners, this is just a quick reminder that we want you to join this chat. Bring your favorite beverage and join us on the Alteryx community for our cocktail conversation. With every episode, we have a question that we hope will be thought provoking for anyone involved in data. We invite you to come hang out for a casual discussion on the Alteryx community. You can find it at community.alteryx.com/podcast and click on the data science mixer episodes. You don't have to be an Alteryx user to join the community and come learn from our data science resources and chat with other awesome data-minded people. We may even feature your comment in an upcoming episode. Here's this week's question to think about. I'll remind you again at the end of the show. Our question for this week is what's your favorite example of a previously manual, time consuming, awful, tedious process that you've helped use your data powers to automate or streamline? We all love hearing those success stories. So be sure to share yours with us and learn from others. Again, join that cocktail conversation about your favorite story of the tedious made awesome at community.alteryx.com/podcast. [music] What are some of the effects that the Census has seen from having this be a much more efficient, automated kind of process?

HECTOR: 22:25

I would say, and Andy, you can complement based on what-- we have so many conversation with Stephanie and the way she sees this, improving their processes. But from our close teammates from Census, we see that they're allowing them to actually spend more of the time into developing new features or increasing the quality assurance, for example, by cutting off the time from the manual intensive processes. And there is a lot of power in that because instead of repetition and things you could automate, you now spend quality time, we call it, on quality assurance or creation of new features, expansion. So time is invaluable. And I would say that's really a lot of a big part of the value gained.

ANDY: 23:05

Yeah. And I think from Census, as Hector talked about, the upskilling is vitally important. Most federal government agencies don't have enough resources. And so they want to make those resources as effective as possible, allow them to focus on higher value work. And that's what Census is doing. They're able to do that now and the innovation agenda can just continue to expand at US Census. We know Stephanie is really big on that and continuing that. And I think from a larger US federal perspective, what Census is doing is starting to attract some attention because this is a real strong example of automation, nonstructured data, leveraging things like satellite imagery, as I talked about the more availability of that. So other agencies are starting to look at that. Some of the other use cases that come out of this, tracking deforestation or wildfires or floods or all kinds of things that can be better tracked and better automated utilizing things like the analysis of unstructured data, like satellite imagery.

HECTOR: 24:08

That is right. And to complement that, Andy, if I can, there's a new interesting to also-- like I said, because before, you have to trust a lot in your statistics and sometimes you just don't have that possibility to collect the data. I was going to mention actually, it just came to me, the COVID situation. So all of a sudden, you don't have that opportunity to go out anymore on the field or at least you have that reduction on those possibilities. So what happens? Do you not get the data? That's not a possibility. You have to get the data. So we see this project as being creative in terms of we'll get the data no matter what. Everything could be closed down. Unless there's a permanent cloud and we can't see through it, we'll get that data. Or if someone is holding a large mirror or some type of picture trying to cover their houses or they're building, that's the only way they can fake this. Nothing here to add, just a mirror. So in that way we're creating new data points, truly new data points, a new way of looking at that data and getting that data. I think that's something really valuable. And then another point is that data is getting cheaper by the day. So all these process, even satellite images that only now are getting a little bit cheaper and more accessible because you can see more and more satellites being launched every day into orbit. So those costs will come down. So when you think about the manual intensive processes versus the new type of automation over time, that's going to cross the line where one is clearly optimal towards the other in terms of costs. So this is something that it's really looking in the future to leverage the power of data and ride that wave. So I think is something that's permeating into the federal government, as Andy said. We see this through TOPS. TOPS is called the Opportunity Project. This is from Census as well. It's a federal challenge for teams that sign up for creating new solutions using federal open source data sets.

HECTOR: 26:05

So we participated actually last year with Alteryx to create what it's called the disaster relief section inside TOPS. So we will leverage this type of satellite imagery and data analysis to compare areas that were hit by disasters such as fire. In our case, we actually did the wildfires in California and analyzed some areas where you can clearly see before and afterwards, the houses are just completely burned out. So what we did was crop those images and analyze the percent that was damaged from the area. So you can think of it in terms of how many houses were destroyed or partially damaged. So we quantified that damage. And because we also took a picture afterwards, two or three years afterwards when some of it was recovered, we were able to compute how much recovery happened as well and tied that into the budget that was allocated from FEMA and other federal agencies to that area so we can calculate the efficiency of that disaster relief. That's an example of how these-- you can just keep getting more creative. And I think this is just the beginning. That's what I'll say.

SUSAN: 27:14

Yeah. It's really exciting. This is really a process of coming up with new and innovative ways of using these exciting new tools that we have available to us. So I'm sure folks will come up with all kinds of creative possibilities. Awesome.

HECTOR: 27:28

And using some interesting ones. Sorry, Susan, I was just going to say because there's a lot of-- so let me just go up a little bit in my thinking here. If you think in terms of technology as a whole and how we're evolving generation by generation, I think in the past, I don't know, maybe 50 years, we've gone into-- we've been on a certain era of knowledge and discovery. And now the next 50 to 100 years are going to be exponentially more advanced because now really what helps you make good decisions and discoveries is the amount of data you can look at. Think about experiments, how long it took people back then to create methods and create processes and now how much faster it is for us now to do it. So you see some amazing, amazing innovations coming up this coming years that will allow us to hopefully do good for the world. I think that's something I really try to keep in my heart is talk to people about it, like you guys, that this conversation we're having here hopefully will inspire other people to always start using these technologies for good because there's always the potential for bad as well. So I think we have to try that balance as well and try to come up with solutions that, of course, will help the economy, will help the private sector, but that could also help the overall population in the world.

ANDY: 28:46

Yeah, and I think Susan touched on a key word here, creativity. And going back to upskilling, by relieving people of these manual processes, they're able to apply their creativity, their passion, their domain expertise to more high level problem solving. In Alteryx we start with the thrill of solving. Well, we're able to enable that with upskilling by automating a lot of the manual processes really allows that creativity to take hold. And I think that's absolutely vital for any organization, but specifically government to tackle some of those big world problems that are going to impact people's lives in a positive manner. So that's why I get a passion working in the public sector because technology, things like analytics and process automation enable that to happen. And I think that's really the most important aspect of it.

SUSAN: 29:38

Absolutely. And just to go even higher level again here, because these kinds of conversations are always fun. I came from a background way back in my undergrad in English lit and creative writing kinds of stuff. And I think people tend to think of data and quantitative fields as not especially creative. But I think this conversation is totally showing how there is immense opportunity for creative thinking in using these tools in new and exciting ways that really can have these social benefits. So it's very exciting. Cool, cool.

ANDY: 30:09

Hey, data science can also be the realm of many liberal arts majors. I'm a history major myself, so I think it's important. Yeah, what analytics is enabling is people of all kinds of backgrounds. It takes that diversity. It's not all statisticians and things like that. They're important but you can have some very capable data people who have diverse backgrounds maybe in the liberal arts, because they're going to be able to tell the stories from the data. I think that's critical as well.

SUSAN: 30:39

Love it.

HECTOR: 30:40

That's a good team, actually, because I also went to Liberal Arts College. It's called Franklin Marshall College in Lancaster, PA. It's very small, but they're really towards critical thinking. And I think that's essential. More than what you're learning is how you're thinking how do you learn anything. I think if you can learn that, you'll learn anything.

ANDY: 30:59

Exactly.

SUSAN: 31:00

That's great. Now, just as a side note for you, Hector, I think Andy already knows this, but I was actually a liberal arts college professor before I started with Alteryx. So I was doing that for about a decade before I started this job. And so it's like, "Oh, liberal arts. Yay. Totally just touched my hub there, Andy."

ANDY: 31:16

I love the liberal arts.

SUSAN: 31:18

Good stuff. Good stuff. Cool. So Hector, I have one more question for you and, Andy, feel free to jump on it as well if you like. But one thing that we always ask on the show, we call this the alternative hypothesis, we like to ask people what's something that people often think is true about data science or about being a data scientist, but that you have found to be false?

HECTOR: 31:41

I love this, actually. It's making me think. I would say I would think it's definitely the concept that the models are just there for you to plug and play and that you just call in three lines of code and here's your result. The model, it's almost like a recipe, again, you have to think of as a recipe and the actual quality is on the ingredients. So I think I love making the reference to culinary to data science. Yeah, I think I'm hungry, actually. [crosstalk]. But that's how I think. So obviously your recipe is equally as important, but not the most important. So I think there's a concept that, "Oh, data science is all about the models, or at least that the models are already perfect and they're ready." But it's really about the whole process for you there. So that's what I would describe.

ANDY: 32:36

Yeah, I think I would describe it that data science isn't the exclusive domain of the statistician or the classically trained-- those people are important. The people that can write code, they're important. But data science today is a team sport. Making effective data science or effective models takes a whole team to really leverage and turn into action the insights that are gleaned from data science processes. And I think that's really the thing that we're seeing. It's really more of a team sport than it ever was before.

SUSAN: 33:14

And just to really mix up our metaphors here for the food and the team, it's making me think of a restaurant. It's like the whole experience of the story that's being told, the experience that people are having, how everyone is working together. So yeah, maybe that's going back to my experience waiting tables way back when. But yeah, awesome. Great.

HECTOR: 33:32

Yeah, because you can go to a place and one place could have better food than the other. But if one place has a little bit better experience in terms of sitting and location and how they treat you there, then that's better. It's not just the food. So I like that analogy.

SUSAN: 33:51

[music] We must definitely be coming up to lunchtime on Eastern Time everywhere. [music]

ANDY: 34:00

So Hector, how did you get into this? Let's talk a little bit about that. How--

HECTOR: 34:05

I'd love to share a little bit of my story, actually, if you guys allow me.

SUSAN: 34:09

Absolutely.

HECTOR: 34:10

So as you can tell by my accent, or maybe not, but I'm from Brazil and I lived my whole life in Brazil. And actually so it's a long story, but I'll make it short. More like a countryside in Brazil. My family had very humble beginnings and I always wanted to help them since I was young. And I just really applied myself in school going through the public schools there, studying a lot. And when I got to high school, they actually gave me a reward for being the best student. And I came to the US for everything paid to learn English for a month as an exchange student. So they sponsored my trip in 2011, I went to Portland, Oregon, and that's where I was learning English still back then. And I still am. But back then I was really learning English and what that trip allowed me to do was really think that everything was worth it, that I was studying so hard through all those years in school, even though I didn't know what was coming next. But that experience allowed me to know that education would change my life, and it did. So I went back and I wanted it-- actually, I learned that I could get scholarships into American universities or colleges, which I didn't know before. As I told you, my family wouldn't be able to pay for my college expenses here. So I would need a lot of scholarships, anything more than full for the flights. So what happened was I spent a whole year studying for college applications in Brazil, US college applications, learning the whole SAT and the whole process back then. And I did everything by myself, learning and doing the essays and asking people using social media how they got in here before and asking them to help correct my essays. So short story, I applied for a bunch of colleges. I get accepted into three of them. Franklin Marshall College was one of them, gave me a 98% grant. So almost as full as a full ride. And I was really happy. I'm crying, talking to my family. I had gotten into college, but I said, "Look, but we don't have the 10%. So we got to figure out something."

HECTOR: 36:10

So as soon as that happened, I wouldn't give up because that was my dream to come here, have a good education, be able to get a good job and help my family back in Brazil. So at that moment, I started selling T-shirts on the streets with educational phrases on them. And I sold a bunch of these shirts kind of trying to raise the 10% money that I still needed to fund the college. And that actually got me into some interviews with TV. So I went through some TV channels and they were interviewing my story that I'm just telling you now, going through public schools that are really tough in Brazil because the quality isn't as good. And one of these TV channels actually was a game show, like In it to Win it. So they invited me to go to this game show. And if I won it, I would get all the money I needed left to come to college. Yeah, believe it or not, that's the-- I wasn't believing. I said, "Are you serious?" I was shaking and I said, "Yes, of course." And I had just a few days before the deadline for-- because you have to sign in your college if you're getting in or not, and you need to prove you have that money beforehand. So I went to this TV game show. I did the challenge and I thought it would be intellectual challenges, but they were all physical. So I had to balance things and I'm really clumsy. So there's one of these challenges, you had a loose wheel that you had to roll into a bicycle rack many meters apart. And I was so shaky. No one could believe it, but I got all these challenges in the first try. I was amazed. And my dad just came in jumping and hugging me on the stage. It was one of the most amazing moments in my life, and for many reasons because I knew I was opening so many doors by coming here to the US and going to college. But it was also showing me that I was right. Education was the way, and I chose something that it was really hard at times because all my friends would go out and I would always be the one staying in and studying. And I had to give up a lot of things to do it and but it was worth it. So at that moment, that was 2013, I came to college in Franklin Marshall in Pennsylvania.

HECTOR: 38:09

Stayed there for four years. I got double major, computer science and economics undergrad. Started working on a nonprofit teaching computer science and that evolved into getting to know Reveal. And I came in actually when we were 6 people and now we're almost reaching 60. So we're growing really exponentially.

ANDY: 38:26

Awesome.

HECTOR: 38:26

Yeah, and but that's the sorry, guys. I wanted to make sure-- I get a little emotional. So I'm sorry. [crosstalk].

ANDY: 38:33

Yeah. That's a great story, Hector. Great for sharing that.

SUSAN: 38:42

[music] Thanks for tuning in to this data science mixer episode with Andy MacIsaac and Hector Ferronato. For more from Reveal and to learn more about how the public sector is using Alteryx to find breakthroughs in their data, check out our show notes at community.alteryx.com/podcast, where you'll find links to more resources. Also, be sure to join us on the Alteryx community for this week's cocktail conversation to share your thoughts. Our question for this week is, what's your favorite example of a previously manual, time consuming, awful, tedious process that you've helped use your data powers to automate or streamline? We love hearing those success stories. So be sure to share yours with us and learn from others. Leave a comment directly on the episode page at community.alteryx.com/podcast or post on social media with the hashtag data science mixer. Cheers. [music] Cool. Anything else? I was waiting, Hector, for you to talk about how--

HECTOR: 39:53

I just love Alteryx.

SUSAN: 39:55

Well, we love to hear that. That's terrific. I was going to say I was waiting to hear how you used analytics to refine your T-shirt selling methods, but I think that might have been a bit [inaudible] for that.

HECTOR: 40:07

Yes, if I knew back then, I would use it for sure. And there's a piece to this as well where talking about these-- I love talking about the workflows as much as I love developing them and sharing with you guys here today. I used to go back every year to give lectures to public schools. So I've talked to more than 10,000 students for public schools, sharing my story and motivating them to believe in education. So I love speaking, as you can tell, especially when it comes to analytics.

SUSAN: 40:36

That's fantastic.

ANDY: 40:37

Well, that's great.


 

This episode of Data Science Mixer was produced by Susan Currie Sivek (@SusanCS) and Maddie Johannsen (@MaddieJ).
Special thanks to Ian Stonehouse for the theme music track, and @TaraM  for our album artwork.