Interested in a specific topic or guest? Check out the guide for a list of all our episodes!
VIEW NOW
NEIL: 00:03 |
[music] Welcome to Alter Everything, a podcast about data science and analytics culture. I'm Neal Ryan, and I'll be your host. Back in 2019, Alteryx welcomed a company called Feature Labs to join our platform, and I remember how excited I was when I learned about the acquisition because I knew we were adding some serious data science chops to the team. So you can imagine how equally excited I was to get to talk with Feature Labs founder and current VP of data science engineering at Alteryx, Max Kanter. |
MAX: 00:34 |
I actually joined Alteryx just about a year ago when they acquired a company I started called Feature Labs. And our goal at Feature Labs was to make machine learning easier to use, and I think we'll go into it a little bit deeper today. So now that we're a part of Alteryx, Feature Labs has turned into the Alteryx Innovation Labs where we're now about 30 people strong, building out new data science and machine learning products for the Alteryx platform. |
NEIL: 01:02 |
Joining Max is Clara Duffy who was an intern at Feature Labs and is now an intern at Alteryx. She's also a rising senior at Harvard studying computer science with a secondary in statistics. No big deal. |
CLARA: 01:15 |
At Harvard, I am the captain of the women's lightweight varsity rowing program. So I spend 20 to 25 hours a week training for rowing. And then in the rest of my free time, I've been a peer advisor teaching fellow for a CS class and work on empowering other women in CS with various positions on the board of Women in Computer Science. |
NEIL: 01:35 |
We talked about their team's impressive advances in machine learning, Alteryx's commitment to open source, and what it was like to work at a startup before joining Alteryx. We also have a special announcement at the end of the show, so stick around for that. Let's jump into it. [music] Cool. So yeah, I guess it's almost a year coming up, what, next month when Feature Labs joined Alteryx. So happy anniversary. |
MAX: 02:04 |
Yeah. Thanks. |
NEIL: 02:05 |
So Clara, why don't we start with you? You mentioned you're an intern here at Alteryx. This is your second summer though, right? |
CLARA: 02:15 |
Yeah. So last summer, I got to work with Feature Labs over the summer as an intern which was my first internship ever, which is a really great experience, and it made coming back here a really easy decision when I had the opportunity. |
NEIL: 02:28 |
Cool. So what was Feature Labs like that first summer? How many people were working there? |
CLARA: 02:34 |
I think there were maybe five full-time employees and four interns. Although, Max could correct me if I'm wrong. It was a really small office which I really liked because I feel like I got to know everyone really well and be able to see all the different things that the company was working on at the time and have an understanding of them. |
NEIL: 02:55 |
Cool. And Max, I want to talk a little bit about your experience, your decision to join Alteryx, but maybe we should start a little further back. What made you found Feature Labs? |
MAX: 03:11 |
Yeah. Absolutely. I think the founding of Feature Labs, I guess, probably starts before I even considered turning it into a company. And it really starts in actually the first few internships I had as a student while I was at MIT, and those summers I was working for a handful of companies like the New York Times, Fitbit, Hewlett Packard. And throughout all those experiences, I was working on data science projects. My background was in software engineering, but a lot of the things I was working on were related to improving things like the search result times when you search on Twitter, or at Fitbit using the accelerometer data to predict sleep quality. And as I was working on those projects, I noticed that I was solving the similar problems over and over again. And so when I returned to school after those internships, I needed to pick a research project, and I wanted to pick something that was related to the work that I did. So I went out and looked at the different labs, and I found a lab that is now called the Data to AI lab, but they essentially focused on making machine learning easier to use and trying to automate the different steps of the process. |
MAX: 04:25 |
And so before even starting Feature Labs, I picked up a research project that ended up turning into my master's thesis, and that was to build out a system called the Data Science Machine. And so we built out the Data Science Machine, and we wrote an academic paper on it. It was published in a peer-reviewed journal, and MIT News decided to write an article about how we had used this system to automatically build machine learning models and compete on a data science competition platform called Kaggle. And at Kaggle, they essentially take datasets from companies and offer cash prizes to the teams that can build the best predictions for that dataset, and so we competed in those competitions with an automated machine and performed really well. And so the MIT's news article focused on that, and a bunch of different publications from around the world ended up picking it up. I think it was something like 150 different publications, and companies started reaching out asking how they could take advantage of this automation technology because they were facing problems building machine learning models, hiring the people capable of doing it, and scaling up their efforts within their company. And so once those people started to reach out, just became interested in figuring out how I could turn that research into a company and so I ended up starting Feature Labs, basically, the day after I graduated alongside my thesis advisor-- or was my thesis advisor at the time, turned into my co-founder, Kalyan Veeramachaneni. And we focused on taking some of that initial research we had at MIT and rebuilding it out into production level, enterprise-ready system that we've been working on for the last five years now. |
NEIL: 06:03 |
That's cool. I think the day after I graduated I hadn't even started applying for jobs yet, and the day after you graduated you founded a company. So I guess that's why you're a VP and I'm not. That's a cool story. So Clara, back to you. Can you tell me a little bit about what you were first working on last summer, the first summer you joined Feature Labs? |
CLARA: 06:29 |
Yeah. Sure. So after I kind of got acquainted with what Feature Labs was doing, I was able to-- I was given the freedom to choose a place that I wanted to focus and project that I wanted to do. And that past year, I'd been taking some linguistics classes alongside of my computer science ones, and I know Max had been interested in seeing if Feature Labs could do something with natural language processing. So I got really excited about that and got to do a lot of investigation further into the natural language processing than I had at school. And ultimately, I was able to create a add-on library to Featuretools, which was Feature Labs main product at the time, that focused on adding natural language processing primitives, which was a tool that Featuretools would use to turn data into a form that was able to be machine learned on. So it was a really impactful project I felt like, and it was very cool to be able to learn so much while still creating something that was useful. |
NEIL: 07:34 |
Very cool. So you built on top of the Featuretools open source library. Can you talk a little bit about Featuretools? |
CLARA: 07:42 |
Sure. So Max is obviously more the expert on this, but Featuretools is a library that's open source, so anybody can use it and anybody can help develop it. And it allows users to take data that might be unstructured or at least is not quite ready to be put into the algorithms that actually do the machine learning step, and it gets more insight from the data in an automated way. So Max mentioned the Data Science Machine, so I guess it was kind of the name of what the Data Science Machine was before. Max can probably say a little bit more, a couple more specifics on that, but. |
NEIL: 08:28 |
Take it away, Max. |
MAX: 08:29 |
Sure. So yeah, as Clara mentioned, Featuretools is an open source library. We actually didn't start by open sourcing Featuretools the first two years of the company. It was commercial only, and one of the impetuses for open sourcing it was we wanted as many people to solve impactful problems with machine learning as possible. And Featuretools really plugged a key hole in the process that no other tool did, and that particular hole was the step that's known as feature engineering. [music] And so when you think about building a machine learning model, there are essentially two inputs to actually training the model. [music] There's the variables or features that you want to use, so these are things that kind of describe the example you're looking at. So imagine you're trying to predict what a customer is going to buy next. These are properties about the customer that you can use to make your prediction like, how much have they spent in the last 30 days? What was the last product that they bought? How many things have they added to their cart, if it was an e-commerce store? And then the second input is the label that you want the model to predict, and so in this case it would be, say, the product that they ended up purchasing. And then once you train the model, you're going to feed it just the features, and you want the model to predict the label that you had provided it at training. And so Featuretools focused on taking data sets that were complex. So spread across multiple tables, they may have been too fine-grained. So you have training examples collected, say, every single second, and you need to aggregate them to a lower frequency of data, and so Featuretools could do that process for a lot of different datasets. |
MAX: 10:23 |
Clara mentioned that we wanted to extend that to natural language, so that was her project last summer creating the add-on to the Featuretools library. And that was a really big project for us because now you could use Featuretools to not only handle the data you collect while you're operating your business that you might store in a relational database, but also data that you might have from text fields. Maybe you had survey results and you were collecting natural text, or you're looking at tweets or forum posts. And so that really extended the capabilities of Featuretools so that it could work on, basically, data in as raw of a format as possible so that you can automate the part of the process where humans are spending the most time. And I mentioned that natural language processing project Clara did was a great add-on for Featuretools. I just looked it up. In the last seven days, it's had over 1,500 downloads, and so there's a lot of people using that tool to extend Featuretools for more datasets. And so the way this all connects back to the Data Science Machine that I mentioned is the Data Science Machine worked on a kind of end-to-end process. [music] And so we pulled out just the feature engineering component and turned that into Featuretools. And then now that we're at Alteryx, right, we're building out all the different steps of the machine learning process starting with this preparation and blending that users do today in Alteryx Designer and taking them all the way through that process until they have a model that they're ready to deploy. |
NEIL: 12:03 |
[music] I imagine if you named the company Feature Labs, you must consider the feature engineering step pretty important to the overall process. |
MAX: 12:12 |
Yeah. It's always interesting when you're naming something. I think we liked the name Feature Labs, but from the beginning, we always knew that we were going to go beyond feature engineering, and so we had a lot of debates. When we first came up with a name, we used to be like, "Oh, we'll come up with something different later." But that never quite happened, and so we were Feature Labs. |
NEIL: 12:36 |
How many downloads does Featuretools have at this point, by the way? |
MAX: 12:42 |
Yeah. So Featuretools at the time of the acquisition by Alteryx, we were around 500,000 downloads, and it took us about two years to get to that point. I think in the last year, really the last 11 months, we've had another 500,000 downloads, and so yeah, we've seen great uptake from pre-acquisition to continuing and really accelerating this past year. |
NEIL: 13:07 |
That's awesome. Now, Max, you mentioned how from the beginning you always knew you were going to develop the end-to-end process out more than just feature engineering. Clara, we talked a couple of weeks ago, and I know you worked on something different this past summer than your first summer. What did you work on this summer? |
CLARA: 13:27 |
Yeah. So this summer, I have been working more over the full machine learning kind of process, I guess, more so than just the feature engineering. And currently, I'm working on building an API that interacts with a back end that is also being built right now that is also supplying information for a user interface that's really pretty, so lots of interconnected [work?]. But it allows users to have one place that they can go through the whole feature engineering and machine learning pipeline, and yeah, I think it's interesting because it combines all the different Feature Labs/Alteryx Innovation Labs products into one place. And it's really interesting to be able to use my expertise from last summer and knowledge from the beginning of the summer where I also got to work on another product in the kind of sphere of Innovation Labs, which is EvalML, and combine all those things to create one really cool product so that users don't have to pick and choose different libraries, and they can do it all in one place. |
NEIL: 14:41 |
So last year, you were working on top of Featuretools, that open source project. This year it's EvalML. Can you talk a little bit about the EvalML? |
CLARA: 14:53 |
Yeah. So EvalML is an automated machine learning tool so that say you took all your data and you're trying to predict something. So say, I think Max might have given a retail example like, how much is the customer going to spend based on all this data that you have about them? And so you might have used Featuretools as a library to get all this data about each user and have this whole dataset where you then want to train a model with that dataset, and then you can use the conclusions that you've come to predict something. And so EvalML is kind of an automated way to choose what the best model is for doing that. So as much as machine learning is just a buzzword, there are many different ways to do machine learning. So there are different models which means different kind of mathematical regressions that you can use to choose like, how am I going to take this data and predict something else? So EvalML is an automated way to test all these different models or regression ways and see which one of them performs the best with your data, and then you can also tune it which means like changing some of the parameters based on stuff that you might know that just the automated tool might not. And so it was really interesting to see kind of the other side of the machine-- or another part of the machine learning pipeline a little further down and get to learn how that can be automated. |
NEIL: 16:24 |
Very cool. Yeah. It's been an exciting year for me at Alteryx and I imagine for a lot of Alteryx customers because earlier this year we released the Alteryx Intelligence Suite which introduced Assisted Modeling, so kind of an intelligent aspect of Alteryx Designer that guides a user through building a predictive model. But even more recently, just last month we introduced some fully automated model building capabilities, and I believe that uses EvalML under the hood. Is that right, Max? Can you talk about that a little bit? |
MAX: 17:03 |
Yeah. It completely does. And I think that's actually a good point to mention why we build the different open source libraries and how that relates to the products that Alteryx customers end up using. And one of our goals from the beginning, and I mentioned this with the decision to open source Featuretools a few years ago, was to get our tools and technologies into the hands of as many people as possible to build impactful solutions to their problems. And if you think about where the open source tools work is they're typically used by developers and data scientists who are comfortable in Python code, in Jupyter Notebooks, and they want to kind of pick the tools they want to use for different steps of the process and have complete control and customizability about how those tools fit together. |
MAX: 17:55 |
On the other side is a lot of the users of Alteryx's products today, business analysts who want to quickly come up with the machine learning solution to a problem they have a deep understanding of, and the faster the machine learning can kind of get out of the way and give them a solution so that they can use their human expertise to iterate on things like, "Hey, are we trying to solve the right problem? Or can we evaluate the solution and figure out how to incorporate these predictions into the decision-making in the company? Or can we take this model and deploy it to create new products for our company?" Sometimes you don't actually need to get down into the details of the raw Python code that a lot of our open source libraries use. And so one of the exciting things for us joining Alteryx, and this is one of the biggest impetuses for deciding to join the company completely, was the opportunity to take a lot of that automation technology that was only accessible by data scientists and developers who are comfortable in Python and put those into the kind of code friendly and code-free platform that Alteryx had. And so, yeah, since the acquisition we've been looking for opportunities to do that in the integration and with Assisted Modeling. To have an auto-modeling path is really exciting for us, and it was great to see that go out in the release this past quarter. |
NEIL: 19:20 |
Yeah. We call Alteryx an analytic process automation platform, so more than just a data science platform. We think here at Alteryx a lot about automation, and now, with autoML, we're bringing it to the data science side of things as well. We've talked about it before on the show just even with all these emerging automation technologies, the importance of keeping a human in the loop. What's your view on that, Max, just in terms of are data scientists going to get automated out of their jobs or business analysts? What's your long term view on that? |
MAX: 20:00 |
I mean, I think one of the things that we observed - this actually goes all the way back to the research that I was doing at MIT - is typically, once you build the first machine learning model, it's not as if you're done at that point. Typically, that then motivates and inspires 10 more models that need to be built, and so what we see with all the automation tools that we're building is that it's actually being used to accelerate the process. And when you accelerate the process, there's actually more work to be done because if a data scientist built a model with our tools, they now have 10 more requests they need to work on. And without the tools, they'd have no way of solving those 10 problems, and then those 10 problems each on their own create 10 more problems. And so what we really see the automation is doing is twofold. First is enabling all the people like data scientists who are doing work today to be more efficient and scale up the problems that they solve, and then second is bringing more people into the fold to solve these problems. And time and time again, we see no shortage of machine learning problems that need to be solved, and by creating these automation technologies it's not taking away jobs from anybody. It's actually creating the opportunities for people to create new models and impact their businesses. |
NEIL: 21:29 |
That makes sense to me. Clara, I wanted to ask you-- so you mentioned all the way back when you introduced yourself that I think you've been taking a lot of CS courses. Do you consider yourself now that you've worked at Feature Labs and Alteryx more of a software engineer or a data scientist? |
CLARA: 21:49 |
I think I would definitely consider myself more of a software engineer than a data scientist. But I think there is an interesting kind of correlation-- or not correlation but relation between the two because I think a lot of those skills that engineers use can be applied to data science. And not only just knowing how to code or something like that, but it's the same kind of problem-solving mindset can be used in data sciences can be used in engineering. |
NEIL: 22:16 |
Max was saying earlier that he was also studying computer science and through his internships ended up working on some data science projects. It sounds like you're following in his footsteps a little bit. He also said that through his internships, he kept solving the same problems over and over which led to him founding Feature Labs. So what problems are you solving over and over, Clara? What company are you going to found the day after you graduate? |
CLARA: 22:43 |
Well, I guess I have a little bit more time for that now since I'm pushing back my graduation a bit. I don't think I have an idea, but I did found a small company in high school. So I think that was, again, one of the reasons why I was really attracted last year to working at a startup was to see a more mature startup than the one that I had created in high school, and a tech-based startup rather than one that I had created. |
NEIL: 23:06 |
Wait, what was the company you started in high school? |
CLARA: 23:09 |
So my best friend and I started a company called Seniors Connect, which sounds like a dating website but it is not. It was a service that we had to help seniors connect with their loved ones by teaching them how to use technology better. |
NEIL: 23:25 |
Very cool. You sound like you're definitely going to end up being an entrepreneur to me. |
CLARA: 23:30 |
That's the dream. |
NEIL: 23:32 |
Awesome. So now that you've had these two consecutive summer internships under your belt, do you have any advice for students listening to this podcast episode on how to get a good internship, what to look for? Sounds like you would recommend having an internship. |
CLARA: 23:51 |
Yeah. I would definitely recommend having an internship. I think it is really useful to think about how you want to go about your career later. I mean, obviously, I haven't even graduated yet, so I don't exactly know how I want my career to go. But I think it's a really cool opportunity to try something out, try a company out for a couple of months, get some meaningful work experience, see if you like the field or not. [music] And then as to how to find a meaningful internship, I think a lot of the advice that I've been given is to find a place where they'll trust you with a big project that will actually have an impact, and I think that I've definitely found that to be the case here. Each summer, I've been able to talk with Max or my manager and find something that both I was interested in and would have some meaningful impact at the company so that I wasn't just off in the corner working on some internal tool and feeling kind of dejected because I was like, "Ah, nobody's ever going to use this." But Max said the tool that I worked on last summer, I guess, has 1,500 downloads a week-- or in the past seven days, which kind of surprised me. I didn't notice that much, but that's really exciting. And even like earlier this summer when I saw that people were still working on the tool that I built last summer and still using it, it made me feel really excited that I was able to create something that actually had an impact. So I think definitely my advice to somebody who might be looking for an internship would be to do your research on the company, see if you can talk to somebody who's already worked there, and see if you would be able to work on something and if you'd be able to be trusted with something that would actually have an impact at the company or be used by a lot of people. |
NEIL: 25:38 |
[music] That's great advice. Max, anything you're allowed to kind of tease out? What are you guys working on right now over in Boston in Alteryx Innovation Labs? |
MAX: 25:48 |
So I think if you look at the integrations we've done into Assisted Modeling by adding in the auto-modeling step, that's a great preview of what's to come. We're really looking to take that to the next level. And so if you think of Assisted Modeling today, it works really well to help people build their first machine learning model because it runs on the desktop. It's easy to install, and they can really quickly walk you through the steps and educate you about how the machine learning process works. And so what we're working on now is, how do we really up that to the next level? And so for the new products that we're working on, they're server-based. They run in the cloud. They scale to big datasets. They take the automated machine learning and search many more modeling types and tune the parameters of those models so you can build really accurate models. And while you're doing that, they present you visualizations and different automated insights into your data. And so our goals going forward is to basically take all the success and great stuff that you can do with machine learning and predictive modeling and Alteryx today and put that on steroids. And so I think you'll see a lot of new products coming out over the coming quarters and years towards that goal. |
NEIL: 27:08 |
That sounds pretty exciting. What about in terms of open source? So Featuretools has been tremendously popular, over a million loads. Historically, Alteryx hasn't put out tons of open source software. We've open sourced things from time to time. We open sourced the .yxdb file format, our kind of Alteryx database, but not tons more to my knowledge. What are we doing with open source software going forward? |
MAX: 27:38 |
Mm-hmm. Yeah. So Alteryx is certainly not new to open source. They have had open source software like Python. The programming language is open source and accessible within Alteryx Designer today, same with R. A lot of the machine learning tools that are inside of Alteryx today are all built on open source software, so Alteryx is very familiar with open source. But you're right that I think with the acquisition of Feature Labs, we're going to be ramping up a lot of the open source involvement we have. And so we've mentioned EvalML a few times today, and that's one of the latest libraries that we've opened sourced in the Innovation Labs. And we have a roadmap of other functionality that we plan to open source, and talking with engineers across the company, they have lots of ideas of what they want to do. And so as Alteryx grows, so will their contributions to the open source community that they've been a part of for many years now. Those will grow as well, and so I'm really excited to see not only what my team can do but the hundreds of engineers across Alteryx, the ideas that they come up with, and the different tools and libraries that they want open source. |
NEIL: 28:58 |
That's good to hear. If there's one thing I know about data scientists, it's they love open source software, so [crosstalk]. |
MAX: 29:03 |
Absolutely. |
NEIL: 29:07 |
Cool. Well, I've really enjoyed talking to you guys today, Max and Clara. Thank you so much for joining us. |
MAX: 29:13 |
Yeah. Thank you. It was fun. |
CLARA: 29:15 |
Thanks, Neal. |
NEIL: 29:16 |
[music] Thanks for tuning in. For more on Alteryx Innovation Labs, EvalML, and other open source resources powered by Max's team, we have them linked in our show notes at community.alteryx.com/podcast. Oh, and remember that special announcement I mentioned at the top of the show? We've officially launched the Alteryx Data Science Portal. Go to datascience.alteryx.com to find podcast episodes like this one along with data science blogs, data science community discussions, and links to resources like EvalML and free data science courses, also career opportunities. Check it out, bookmark it, and spread the news with your fellow data scientists. Catch you next time. [music] |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.