BONUS: Data Science Mixer Podcast | Renee Teate

Question

Renee Teate shared her career move into data science publicly through social media and her podcast, Becoming a Data Scientist. Today, Renee — now director of data science at HelioCampus — joins us for a special video episode of our Data Science Mixer podcast to share her experience and advice for others who want to deepen their data science knowledge and advance their careers.

This episode originally premiered on Data Science Mixer on May 19, 2021. The Data Science Mixer podcast lives on its own feed on Apple Podcasts, Spotify, or right here on the Alteryx Community, so in order to hear more incredible episodes like this one, search specifically for "Data Science Mixer" and subscribe.

Panelists

* Renee Teate - LinkedIn, Twitter, Website
* Susan Currie Sivek - @SusanCS, LinkedIn, Twitter

Topics

* Alteryx Inspire 2021

HelioCampus

Renee’s “Becoming a Data Scientist” podcast

Renee’s Live Twitter Q&A during Inspire episode premiere

Seeing the Forest for the Trees: An Introduction to Random Forest - blog by @SydneyF

Predictive modeling interactive lesson

Powering the student experience with data science | Danielle Lyles - Data Science Mixer podcast episode

Do Charts Lie? A Conversation with Data Visualization Expert Alberto - Data Science Mixer podcast episode

What is a Confusion Matrix? – blog by @joshuaburkhow

De-Confusing the Matrix – blog by @OllieClarke

Transcript

Spoiler

SUSAN: 00:01

Hello, listeners. Do you ever feel like the length of your list of things to learn in data science is rapidly approaching infinity? Data Science Mixer is here to offer some comfort and guide you back toward sanity. I'm excited to share this awesome and motivating interview today that originally debuted as a video session at the Inspire Conference hosted by Alteryx. But this episode is the full and complete version with still more great conversation. Let's jump right in. [music] Hello everyone, and thank you so much for joining us today for this special episode of Data Science Mixer, the podcast from Alteryx where we talk to top experts in lively and informative conversations that will change the way you do data science. I'm Susan Curry Civic, the data science journalist for the Alteryx Community. I'm delighted to have with us today Renee Teate, the director of data science for higher education analytics firm, HelioCampus. I have to admit that when I think of Renee, I think of her as data science Renee, or becoming data sci, because that's how she's known on Twitter, where she has shared her journey into a successful data science career. You might also know Renee through her blog and podcast titled Becoming a Data Scientist. I'm excited to hear from Renee all about her career adventures. She'll share what she's learned from her own journey and the wisdom she's gathered from others along the way. I know she'll have fantastic insights that will help all of us continue to advance our own data science knowledge and careers. Renee, thank you so much for joining us.

RENEE: 01:28

Thanks for having me.

SUSAN: 01:30

Awesome. So I alluded to your data science journey here and your path into your current career and position at HelioCampus. Could you give us kind of the nutshell version of what your career journey has been?

RENEE: 01:42

Sure. Well, when I finished my undergraduate degree, it was kind of a generalist degree in integrated science and technology. So I had a little bit of background in a lot of different science and math type of courses. And I went to James Madison University and I ended up just doing database design, website design for small businesses. So kind of independent consulting. And I realized throughout that that I really liked the database part of it, relational database design. So I started kind of throughout my career working more and more with databases, becoming a data analyst. And then I think it was about eight years after undergrad, I went back to school and I got a master's degree in systems engineering. So it's another generalist degree. And it ended up being a lot more math than I expected. But it did really teach me a lot about some advanced algorithms and things that I didn't know about before. And then I taught myself Python and some data science machine learning techniques and then started kind of transitioning from data analyst to data scientist.

SUSAN: 02:48

Very cool. And your current position right now, could you tell us a little bit about that?

RENEE: 02:53

Sure, I'm the director of data science at HelioCampus. So I started at HelioCampus about five years ago and that was my transitional role going from data analyst into data scientist. So that was my first job where I was really full-time doing machine learning kind of end to end working with SQL, working with Python, building dashboards and doing all of the work that comes with both data analysis and data science. There were several of us at the company. It's a startup, as you mentioned, working with university analytics, but we were each working independently. So each of us were assigned to different clients and we were kind of working solo. So since then, we've transitioned into working more like a team, centralizing our work, sharing the results of our work. And that's been a transition that I've been a part of. And so throughout that, I became the director of the new team. So I'm the director of data science. There's four of us on the team. We call it DS Ops, which is data science operations, products and services.

SUSAN: 03:55

Okay, awesome. Yeah, and I would love to maybe later hear a little bit about some of the projects that you're working on. Curious, though about this transition into data science. If you were looking back to the Renee who was starting this journey, who was moving from that data analyst position and starting all of this self teaching and so forth, do you have any advice that you would give to her for how this journey was going to go and things to do along the way?

RENEE: 04:20

Yeah, I would mostly say things that I was already good at are the things that have really helped me in my role as a data scientist. So I felt a little bit out of my element and like I had so much to learn and I have learned a lot. But being able to work with data, talk to people about the data, kind of work with stakeholders, explain the results of my analysis, talk to people at the universities and colleges, talk to people in different teams within the company, that communication side really helped me move into a leadership role, work with the clients. And so it wasn't the technical piece that ended up being my strongest element, but those other aspects of having a career working with data, even if it wasn't in a machine learning type of capacity, really enhanced my ability to make myself valuable in my role and to work with a lot of different teams. And so I guess to summarize the advice that you already have a lot of what you need. So don't worry too much. You'll get the technical skills. Those are easier to pick up, I think.

SUSAN: 05:31

And that's so interesting to hear you say, because I feel like for a lot of people, those technical skills would be really daunting. The idea that, oh, we're going to teach yourself Python, you've taken on so much additional math that you've learned. But I like hearing you say that really it was about recognizing your existing experience and strengths, it sounds like.

RENEE: 05:50

Yeah, and also being able to eventually work with a team that really helped because I can lean on others for some of the gaps in my knowledge, I can learn techniques from other people. I've learned techniques as I've gone through my job. And I just try to encourage people that are daunted about getting into data science that there's so much you can learn on the job and there's so much that no matter how much background work you do, you're going to learn more in the job anyway. So it's not like you're ever done learning to become a data scientist and then you can get a job as one. I think a lot of people think that they have to learn all the topics on their list before they start applying for jobs. And there's really a huge need for people with probably the skills that you have now. So if you can find a transitional role that lets you take advantage of the skills you already have and grow the skills while on the job, that's really a good set up if you can get into that kind of role.

SUSAN: 06:46

Perfect. Yeah, that's awesome. So as you were going through all of this learning, were there particular concepts or particular techniques as you were learning data science that just really stood out to you, that were particularly exciting to you, maybe an algorithm or an area of application where you were just like, oh, this is one of my favorite things now?

RENEE: 07:04

I think one thing that has become my go-to algorithm is the random forest algorithm. So it was one that early on I was trying different techniques and that one just always worked well enough. So it might not have always been the top result, but it was never a bad result. And so that's has become kind of my go-to algorithm, at least for the type of data that I work with. So I would say random forest is my favorite and my first technique that I try each time. But I would say just in terms of of what's exciting or interesting to me has been learning about all the different trade-offs that as you use each technique, there's so much iterative work and tuning that you have to do when you're doing machine learning and that every change you make makes some part of it perform better and some part of it perform worse. And so learning about those techniques to optimize and figure out what you're optimizing for, I think that has been some of the more interesting parts of the work and the learning beyond just running some code is how do you-- there's a lot of nuance involved that I don't think a lot of people realize before they get into it.

SUSAN: 08:18

Yeah, yeah, absolutely. I love you saying that random forest is your favorite go-to algorithm. I think we should all have patches or lapel pins for our favorite algorithm [or something like that?] so we can show them off. So as far as learning about optimization and all of the tuning options that you have, was that mainly a process of trial and error of working with your team and learning from others? What were some of the main ways that you developed that sort of deeper level of knowledge?

RENEE: 08:45

Yeah, a little bit of each. I mean, I think a lot of it was just experience because you can read books and blog posts about different techniques, but different techniques work better for different types of data. And so developing that domain expertise of what it's like to work with this type of data that I work with in my job, categorical data and continuous variables and things like that, having lots of columns to go through and engineering features and how doing each of those things on our specific type of data affects the outcome. It would be very different than somebody, for example, working with computer vision and a self-driving car. That's also a type of data science. But their work and what they would need to learn to make that work well and the types of algorithms they use and the type of code they write would be very different. So I've kind of specialized for the task at hand. So, yeah, learning optimization, a lot of it was trial and error and being surprised at things that worked. If I kind of automated a variety of different parameter, looping through a bunch of different parameters or doing a grid search and seeing what ended up coming out on top. But then also kind of following my gut instinct in some cases and saying, "Well, if I change just this piece of it, how does that change the output?" and learning from experiencing it and from looking deeper at the data, not just running a data set through my algorithm and ignoring what's in the data set. I think really understanding the data and where it comes from and what your different transformations are doing to the data and what it means, that's really a part that I've learned is important and you can't just automate it all away.

SUSAN: 10:31

Yeah, yeah. That really speaks to the issues of both communicating with the people who are giving you data and learning all of that background and information to inform that process. But then also sounds like exploratory data analysis as well and not ignoring that part of the process. not jumping right to the favorite algorithms.

RENEE: 10:47

Absolutely. That's a really important part of the process.

SUSAN: 10:50

Yeah, yeah, for sure. So was there something kind of on the flip side now? Was there something that was extra challenging for you as you were going through your learning process?

RENEE: 11:00

Yeah, I think the thing that's most challenging for me, at least-- for example, in my master's degree, I took some courses that were taught by people in math departments, some in engineering, some in computer science, and the notation was different for each of them. So learning how to read mathematical texts and understand [their teaching?], it's still a challenge for me now. So I would say that's the most challenging part. I think I can get the concepts and if somebody explains it to me, I get the hang of it like, "Okay, I understand why you would do that and what the approach is," but then reading it in text form with these equations that have different notation and matrix calculus and things like that, that's probably the most challenging piece for me.

SUSAN: 11:43

Yeah, for sure. I know I've had at various points different cheat sheets print it out for notation just so that I could look at it and be like, "Okay, yes, let me translate this into normal words," so, because my background is not in math. So yeah, definitely that is an obstacle to learning, but one that can be overcome. So that's encouraging. So you mentioned earlier, too, though, that as you have moved into data science, your learning isn't over, right? You are continuing to learn and develop new things. But now you're also working full-time doing data science and being a director, right. and taking on all these bigger projects. So how are you balancing now that continuing learning with your full-time everyday work?

RENEE: 12:25

Yeah, well, I mean, I'm lucky that my work requires some of the learning so I can do that learning on the job. I do not make a lot of time for learning projects anymore on the weekends. I used to do that a lot when I was first learning data science. At this point I learned what I need to learn for the work and I'm lucky that I get to make that time at work. So it's just a normal part of doing my job is that I expect to have to learn new things. I expect that when I make a timeline for a new project that I have to build in time for learning new techniques or talking to colleagues and finding out how they would approach it and integrating different approaches. And now we're working more as a team. And so we're each contributing techniques to kind of a central repository. And so we can more quickly access and learn from what people have done on previous projects. So it's a very project based learning for me. I mean, even when I was learning on my own before. I don't learn well by just taking a class or going through a textbook, I have to have a reason for doing the learning. And so having a project that I'm excited about and a data set that's interesting to me makes it easier to learn the technique and then understand if I'm doing it right based on the outcome, because it's a data set that I understand the context and it's not completely foreign to me where I wouldn't know if it was the right answer or not.

SUSAN: 13:47

Right. Right. No, that makes a lot of sense. Yeah. Certainly having some sort of project with a set goal and a deadline, deadlines are always helpful from experience being able--

RENEE: 13:55

That's true. It makes your learning more efficient because then you'll only learn what you need to do for the project.

SUSAN: 14:01

Right, right, right. And stay focused on it. Yeah, definitely. So I'm curious about the central repository for sharing knowledge that you mentioned. Is that something you can tell us a little more about?

RENEE: 14:11

Oh, so just basically starting to get more as a team so that when we change our techniques or learn new projects, we can share that. Also within our group, one of my team members has built a framework so that we can start standardizing our approach. And so as we use new techniques, we can integrate that into the framework so that we all have access to each other's approaches and techniques. So it's just something that I've decided as a director that is something that would be really useful to have not only with our current team members, but when you're on board somebody new to say, "Here's our typical way of doing things, but if you have a new way to do it, you can also contribute," I think it helps get people up to speed more quickly. And it's a benefit I didn't have when I was starting out that I wish I did, being able to lean on the other data scientists in the team. So it's something I really prioritize, but it's really just making sure that we can all access each other's work and that we share that on a regular basis. So when we have our weekly meetings, our team meetings, we talk about different approaches that we're taking to doing solving problems that we've been presented with. And then we also have a monthly data science roundtable where we present not only to other data scientists, but to the whole company, anyone that wants to come can see what we're currently working on, what approach we took, how the clients have responded to it. So sometimes you can do something that you think looks really cool, but the clients don't understand it. And so being able to communicate the results to the clients is also important. So not just technical techniques, but presentation techniques and dashboard design and things like that we also share.

SUSAN: 15:50

Yeah, I can understand definitely that communicating with clients and getting them to understand the technical details, but also the main takeaways of your projects could be challenging. So I know one of the big things that you have focused on in your public communications about data science has been demystifying data science and talking about those concepts really clearly. So do you have a couple of your top tips for helping people who are also working on that skill who want to be able to communicate about data science more clearly?

RENEE: 16:19

Sure, I gave a presentation at a conference about this recently where I had started out, I kind of pitched the idea of giving a presentation about making presentations and saying, "Here are some tips for that final presentation." But as I was developing it, I realized all my tips were really about communicating throughout the project and not waiting until that final presentation in order to get people on board. And so I changed my whole presentation to be about communicating about the project up front. So I really think that it starts with how you work with the stakeholders throughout the whole project, how you define what the deliverable is, explain what you're working on and then kind of bring them along with you. So when we do exploratory data analysis at HelioCampus, we have regular meetings with the end users at the institutions. And so we show them, "Here's what we're finding. Does this make sense? Does this go along with what you expected or is it surprising to you? And if it's surprising, we need to check to make sure that we're correct," and that could be an interesting finding. Sometimes we find out that there's something wrong with the data. And that it was surprising because it's wrong and it's better to find out early. So as we go through that communication cycle with the client, then by the time we get to the machine learning piece, they know what inputs are going into the model. We talk about the most important features when we generate the scores and we deliver the scores in context. So we're not just giving a number, it's not just black box. And so when the end users see those results, they understand it already. So when we're doing our final presentation, even if the presentation includes people that weren't a part of the process, we have allies on the team that can explain it in their terms and to other people on their campus of what we did, why we made certain choices and why the results came out the way we did. So we kind of already have the buy-in and the understanding before we get to that final presentation.

SUSAN: 18:20

That's great. I love that idea of kind of building your support crew into the process and then having them ready to help with that explanation and presumably implementation as well, once your project is complete. That's awesome.

RENEE: 18:33

Yeah, definitely.

SUSAN: 18:35

So are there are a couple of projects that you are able to share with us?

RENEE: 18:39

Yeah, sure. So we're doing a lot of different predictive models that are around understanding processes at the institution or what factors at the institutions are most associated with certain outcomes. So, for example, for student retention, a lot of schools are interested in for students that start as first time, full-time freshmen that are degree seeking - and we have to define it like that. We're not just saying students, it has to be a certain set of students - what factors for those students seem to be most correlated with their retention, meaning they're still enrolled a year later. So there's a lot of schools are measured by that retention rate. It's a big metric that they're measured by. And of course, you want the students to stay enrolled and to move towards their degree. So the first step towards getting a degree is to get your first year under your belt and end up with enough credits to continue and have a good outcome. So we've done different projects with data all the way from admissions to financial aid to enrollment and course outcomes and being able to explore all the details and then correlate that with whether the student retained or not. So it ends up being a simple binary classification model with a lot of different inputs that are interesting to explore. And then talking to the institutions about potential policy changes or interventions that they could do to help more students get to that one year retention and measuring if they make a change, how does that change retention rates or how did the students respond. So it's definitely interesting work to be able to get data from all the different points in a student's life cycle and highlight what seems to be working well.

SUSAN: 20:21

Yeah, very interesting. My own background was in academia prior to joining Alteryx. And I actually just interviewed someone else who does work with data in higher ed, and one of the unexpected factors that she mentioned was parking permits. And I thought that was super interesting as a potential factor for retention. So I imagine you see some surprises along the way as well.

RENEE: 20:41

Yeah, that is really interesting. And something like that is not data that all schools have easily accessible. And so what we tend to do is start with a type of model framework that takes into account the most common variables that most institutions have. And then we'll talk to the stakeholders and find out are there other pieces of information like surveys or data from parking permits or door swipes at the library that could be informational. And so we'll work with the institutions to kind of join that data into our standard data set. But not every school has that data at that level or has a way to make it easily accessible and refreshable. So building a pipeline is part of the project as well. So instead of doing a one-off piece of research, we're really trying to build a pipeline to be able to monitor these different pieces of information over time. So, yeah, a special piece of information like that are really useful, but it's often hard to get them in a frequently refreshed way that is accessible for a lot of different models and experimentation and can be updated over time and to have enough past history to train your model, right? So a lot of times the schools will come to us and say, "Oh, we have this cool new piece of information." "Well, you just started collecting it this semester. So we don't have the history in which to train the model on."

SUSAN: 22:03

And that goes back to that communication piece then I imagine then having to explain how models are trained and why six months of data might not be enough to do the kinds of predictions they're interested in.

RENEE: 22:14

Absolutely, and that's another tip I would give is we have a presentation that we gave at the beginning of the project to all the stakeholders about what the process is going to look like and how we do exploratory data analysis, what we're looking for, how we have to check to see if the data is available far enough back in time, if the trends are consistent. And so the data field is good for training a model because it hasn't changed meaning dramatically over the time that we're using for the training set. And then now we're doing communication about how major changes like COVID have affected our models and affect the results. So the communication piece and giving them an expectation up front about what the process is going to look like does really help.

SUSAN: 23:00

Absolutely. Can you tell us a little bit about that, about what you're seeing as far as the effects of COVID on universities' data and how some of the ways that you might be accommodating that?

RENEE: 23:10

Yeah, well, what we found out when we looked back at models that, for example, were built in 2019 to predict 2020 outcomes, we saw that the models sometimes overestimated the rates or the likelihoods, but the factors were still the same. So a model might have anticipated that a lot more students would enroll than did, but the things that were keeping them from enrolling, such as financial aid issues or academic issues, were still the same and had the same impact. There were a few changes and we had to kind of dig in and understand. For example, at one institution, the nursing program suddenly had much higher retention than usual. And at that time nursing was in the news and nurses were in high demand. So it was really interesting to find that. We had to dig in. And that was another thing that was part of explaining that process and having a transparent model, we were able to dig into the specific factors instead of just saying, "Your numbers are wrong," we could say why they were wrong and in what way and better understand how things had changed, what were the impacts. Or for students where we didn't know why the change happened in certain groups, we can then seek more information and say, "Hey, our model is missing some component because it started performing really well for this particular, or performing poorly for this particular group and we don't know why. So we need more information." Or it might highlight something that the school doesn't even collect or that we're not aware of, like a student's family situation or working outside of campus. So there were a lot of things that were highlighted by COVID, but for the most part, the models held up in terms of the things that were predictive. They just struggled in terms of scale. Oh, and one other thing that is becoming a challenge going forward, now that 2020 is going to be part of the training data going forward, the question is how do we include that data, should we leave it out as an anomaly year, are things going to permanently change or go back to normal? And one thing that's proving challenging is that GPAs are usually predictive for a lot of things, retention, graduation rates and things like that, and a lot of schools went to pass-fail grades for at least one semester. And so the GPA distribution has changed dramatically or we just lost information about a student's specific grade. So though it might have helped the student, they're more comfortable with passes on their transcript instead of maybe a lower letter grade than they would have liked, that is going to prove challenging to the actual modeling, because that's a major piece of information that we won't have. And so we have to think of things like maybe we'll translate all the past grades into pass-fail and see if we can include the pass-fail and make it useful. Or maybe we leave 2020 out of the training data and just treat it like an anomaly. So we're going to have to figure that out as we go forward. But because our models are transparent and we have that depth of domain knowledge, we can make those decisions in an informed way.

SUSAN: 26:23

Yeah, yeah. That's so interesting. It reminds me of something that I talked about in the interview for Inspire actually, with Alberto Cairo, where we talked about data seeming very sciencey and objective and to some degree, right, we can find things in data that we might not expect, like what you said about the nursing program suddenly having such high retention. And at the same time, you do have to make those kinds of decisions like, "Well, maybe an A is going to have to just be coded as a pass for this particular model and we just go with that." And it's interesting to hear you talk about the the depth of subject-matter expertise being really important for making those kinds of decisions.

RENEE: 27:00

Yeah, humans definitely make a lot of decisions through the process that influence the outcome. People do like to think of machine learning as like, "Well, a computer can be biased. It's just doing math." But people decide what data to collect, how to transform that data, what subsets of the population you're building the model for. If somebody decides to opt out and you can't use their data, then let's say a lot of people in one demographic all have opted out, well, your model may not represent that demographic at all. And so what does that mean for the performance? People make decisions on what factors to include, what algorithms to use, what to optimize for, what performance metric are you optimizing for and then how to apply the results of the model, what do you do with this information in the end. So even though, yes, technically computers are doing math to, your data is going through an algorithm and there's not human bias in the algorithm. There's so many human choices made along the way that the output absolutely can include human biases.

SUSAN: 28:07

For sure, for sure. Any other projects that you wanted to talk about that you're doing right now?

RENEE: 28:12

Not specific projects, but an area that I'm really excited to do some new exploration in is what I just talked about with bias and machine learning and understanding the effects of our models and where they're performing well and where they're not. So my team recently attended a conference related to the fairness and accountability and transparency of machine learning. And you hope to pick up some techniques that will just solve all the problems and remove all the concern, and it really brings up more questions than solutions when you attend something like that. But there's really exciting research going on in the field about ways to address bias or to detect bias. And so that's an area I'm really interested to continue to learn more about and to make sure that we're using techniques in our models to make sure that we're not perpetuating certain biases. And also kind of hand-in-hand with that and related to projects that work, we do a lot of communication with the client about how to use models and how not to use models or when we shouldn't build a model for something. So, for example, if the data isn't as robust as we need to really get a good outcome, should we be building that model when the data's not really in the shape we need? Or just because somebody's asking for an algorithm or an analysis, do you do it exactly the way they want or do you kind of consult and explain why one approach is better than the other? And so we're working a lot on that kind of communication about how models and the results of models should be used. And of course, most of our clients are totally on board. People don't want to harm the students. But just making sure that we're documenting things, that's a big project for us now so that as people that we haven't worked with in the future start using the results of these models that we built in the past, we understand when it should be retrained, how well the model's performing, what the model was built for and what's included in the model, and then what the prediction means so it doesn't get massively misinterpreted later on.

SUSAN: 30:23

Yeah, yeah, for sure. All really important issues that you brought up there. And I think things that everybody is struggling with around bias and accountability and documentation, right? Those are ongoing struggles, I think, in a lot of areas, but maybe particularly weighty in your field. So, yeah, super interesting to hear you talk about that. So we have one question that we always ask on Data Science Mixer to our guests and I'll ask it to you now. So we call this little segment the Alternative Hypothesis. And the question is, what is something that people think is true about data science or about being a data scientist but that you have actually found to be incorrect?

RENEE: 31:03

Yeah, so the biggest myth that I see is people think you need to learn everything before you can call yourself a data scientist and there's no such thing as learning everything. I mean, I'm years into this and my bookmark list of things to learn has gotten longer and not shorter.

SUSAN: 31:19

Makes me feel better.

RENEE: 31:20

So yeah, learning the basics and understanding how to evaluate models and understanding the statistics behind what you're doing so that you can understand what you've just created is really important. And understanding the data going into your model and having that domain knowledge is really important, but you don't have to know every technique. I mean, for example, I don't work with images. I don't work with natural language processing in my current job. So I don't have any depth of expertise or techniques other than the very most introductory basic techniques in those areas. So if I were asked to build a model related to that or to work with an institution that wanted that type of analysis, I would have to lean on someone else's expertise. But with the skills that I have, I've been able to develop depth in the area that's needed for my role. And it's really not as expansive as a lot of people expect going in. So I would say that's the biggest myth. You could really find a subset of skills that you want to become good at. If there's a certain industry you want to go in to, make sure you have that domain knowledge and kind of specialize. So learn a lot of basics, learn a lot of depth in a specific area, but you can't and shouldn't expect to develop depth in all areas. It's impossible.

SUSAN: 32:39

That's good to hear. And I think that will be very reassuring to a lot of folks who I think we've seen, for example, just what's offered in online training courses and so forth, just expand and expand and expand. It's like, oh, how can anyone human ever conquer all of this, right? It's huge.

RENEE: 32:56

It's overwhelming. But yes, you don't have to. You don't have to conquer all of that.

SUSAN: 33:01

Good to know. Good to know. Is there anything that we haven't talked about that you would like to get in there while we have this opportunity to chat?

RENEE: 33:09

The only thing I've been thinking about lately that I think might be interesting for people to think about that is less of an issue when I was a data analyst but has become kind of forefront as a data scientist is that all of your choices are tradeoffs with modeling. So when you are evaluating models, you'll see a typical evaluation of a classification model is a confusion matrix. So you have your true positives and true negatives and false positives and false negatives. And just knowing that every choice you make to optimize for one of those things will necessarily affect one of the other things and understanding what to optimize for. For example, in our retention models, we might not want-- we might sacrifice overall accuracy in order to improve the negative class recall if the purpose of the model is to address potential issues with retention. So if you're trying to find students that might need an intervention and might need a special tutoring or might need extra financial aid, you don't want your model to miss them. So you might build your model so you might have more false negatives, but it allows you to address potential issues rather than the most accurate model that misses a lot of students that potentially have a likelihood of not continuing with their education. So just understanding the decisions and that every decision you make affects the model outcome in some way and that those outcomes have real world impact. It's just a lot more to think about in that area than I initially expected. And when you're doing your practice models and learning if you're trying to become a data scientist, playing around with not just optimizing for the most accurate model with the overall accuracy, but understanding those different evaluation metrics and explaining why you might choose one over the other, that's a really good skill to develop. And it would sound good in an interview if you were asked about it and you can really go into depth about that, because it's something I didn't think about before I was doing this on the job. And I think a lot of people miss in the learning phase, but become so important in the real world applications.

SUSAN: 35:26

Absolutely. And there's so much nuance there that comes with experience and it's nuance that has real world consequences. I mean, for example, when you're talking about students who might need an intervention so that they continue in college, I mean, that's a pretty real life consequence for that particular individual and then on a larger scale for the institution. So super important thing to be thinking about. Yeah, interesting. Well, Renee, thank you so much for joining us on Data Science Mixer. I think you've shared a lot of insights that people are going to be able to immediately take to their own studying in their own career growth. And a lot of really interesting examples of stuff that you're working on that will inspire them so. Thank you so much for being here.

RENEE: 36:04

Great, thanks for having me. This was fun.

SUSAN: 36:06

Yes. [music] Thanks for listening to our Data Science Mixer chat with Renee Teate. Join us on the Alteryx Community for this week's cocktail conversation to share your thoughts. Here's our conversation starter this week. I mentioned earlier that maybe we data people should have badges or lapel pins for our favorite algorithms. What would your lapel pin look like for your favorite algorithm? Draw it on a sheet of paper or doodle it on your tablet and post a pic. Share with our community by leaving a comment directly on the episode page at community.alteryx.com/podcast or post on social media with the hashtag data science mixer and tag Alteryx. Cheers. [music]

Subscribe to Data Science Mixer on your favorite podcast listening app like Apple Podcasts, Spotify, or right here on the Alteryx Community.

sireeshagandam · Answer

good

SusanCS · Answer

So glad you enjoyed it, @joshuaburkhow! I've really enjoyed following Renee over the last few years, and I was thrilled she was able to join us for the podcast.

joshuaburkhow · Answer

This was so good @SusanCS ! Loved it