Alter Everything Podcast

BRIAN: 00:06	Welcome to Alter Everything, a podcast about data and analytics culture. I'm Brian Oblinger and I'll be your host. We're joined today by Ken Black and Garth Miles to discuss big data, the process of creating great content, and the data scientist's role. Let's get right into it.
	[music]
BRIAN: 00:28	Gentlemen, welcome to the show.
KEN: 00:30	Thank you, Brian.
GARTH: 00:31	Excited. Thanks, Brian.
BRIAN: 00:33	Yeah, absolutely. It's wonderful to have you guys on. This is going to be episode two of the podcast, so it's just going to be great. Where I would like to start, as always, is with some intros. So Ken, let's go ahead and start with you.
KEN: 00:46	Hi. My name's Ken Black. I work as a data scientist in the automotive industry. I've been doing computational science, various types, for over 30 years now. And in this podcast, I hope to tell a story about that and where I see modern analytics, where I see it now and where I see it changing in the future.
GARTH: 01:10	Hi. I'm Garth Miles. I'm an engineering manager here at Alteryx and I'm really excited about this episode. I've been watching and following Ken for many years now and can't wait to be a part of this experience.
BRIAN: 01:27	So one of the things I'd like to dig in first here is let's talk a little bit about how each of you came to analytics because I think as we're talking to more folks along the way here, we're realizing that one of the really interesting things about the industry is that people have really different stories about where they came from, what they were doing before analytics, and how they found themselves here. So Ken, I'd love to kick off with you. I know you have a great story about that.
KEN: 01:55	Well, sure. Thank you, Brian. For my story to begin, we have to go way back in time to the mid-1980s when I was doing my master's degree. I did both a bachelor's and master's in geology with an emphasis on mathematics and computer science. So back in 1985 to '87 time frame, my master's thesis was called A Microcomputer Groundwater Data Analysis Program. What that really meant was that I combined data analytics with simulation, visualization, predictive analytics to do things related to groundwater remediation. So back then, of course, we had slow computers. We had very little memory. So a lot of the things that we were doing were limited in the computational realm by the hardware that we had at the time. Certainly, even back then, hardware was ahead of software but not by much and so we had to do a lot of things the hard way. So when I launched my career then, for the first 20 years, I took that experience and pretty much was working on groundwater remediation studies around the country. So computational models we were building were three and four-dimensional models that looked at how do you remediate contaminated groundwater? The purpose of that work was primarily environmental remediation for cleaning up groundwater supplies, but it was more than that. It was projects like the Exxon Valdez oil spill in Alaska, things like cleaning up groundwater in Cape Cod. That was a project that spanned 10 years. Another big project to work on was the environmental restoration of the Everglades. So these are major environmental projects, but there were also a lot of little ones where we were looking at cleaning up water supplies and national laboratories, places where we had radionuclide transport, where we had significant environmental problems.
KEN: 04:01	So all along that pathway of doing that work, I did a lot of work in writing computational codes in languages like Fortran and C and C++, graphical processors to be able to take the data that we were calculating in the models and visualize it. So creating animations, creating 2D and 3D plots of contamination, and then, a lot of quantitative post-processing of the data to calculate things like contaminant migration, velocities, contaminant removal quantities, volumetric fluxes of groundwater, things like this. So for 20 years, I did a lot of computational science. And after 20 years of that, it was getting to the point where I thought to myself, "I'm really good at this and I could continue to do this, but I want to try to do something else." And at the time, I was coaching basketball and one of my friends-- one of the boys on my team, his dad had a company that did process improvement work. And so in about 2007, about 11 years ago I guess now, I switched over from environmental science to process improvement. And during that time, I got exposed to a lot of different types of businesses, a lot of different types of data coming from the businesses where we were focusing on trying to improve processes. The industries included medical manufacturing and transportation, healthcare, just a variety of things where processes needed to be improved. So part of what I did there was I rewrote the company's software. I reverse engineered it to bring it up to contemporary standards. And then that kind of began my journey into really kind of pure business analytics.
KEN: 06:01	And so when you get training in statistical process control theory and you write software to do that, there's a lot of learning to be had. So I went from this scientific background to this very applied statistical testing methodology that we used. And then eventually, after about seven or eight years of doing that, I switched over to the automotive industry. So that's sort of the history of how I got to where I am now. But before I conclude that, let me just say that really in about 2008, I started doing a lot of visual analytics work using Tableau. And then about four-and-a-half, five years ago, I got Alteryx. So in the first five years of using Tableau, I took on more and more challenging projects and I got to the point where I couldn't really solve them exactly in Tableau. I had to do a lot of custom coding to augment Tableau's capability. But then once I got Alteryx, all of the custom coding went away and all the data prep went into Alteryx to drive all the work that I do. So everything that I do now goes directly into Alteryx. All the data prep and manipulation happens there and then results get kicked out for visual analytics. So that's sort of how I got to where I am.
BRIAN: 07:24	Wow. Looking forward to you trying to top that one, Garth.
GARTH: 07:28	Yeah, well, I have a followup question before I even try. But Ken, can you tell me a little bit about what languages you were using pre Altryx in your coding efforts?
KEN: 07:44	Sure. A lot of the work that I did-- I say that I can program in 10 different languages and people hear me say that and they're thinking I'm talking about spoken languages but they're actually computer languages. So my master's was done in Turbo Pascal, a Borland product way back when, so I became proficient in Pascal, Fortran, C, C++, of course, HTML, XML, CSS, all of the web-based technologies, a little bit of C#, a little bit of Python. But basically, since I have been doing Alteryx, I just pushed aside almost all the programming and it's all done in Alteryx. So why should I spend my time writing custom codes when Alteryx does everything for me now? That's really where I am.
GARTH: 08:38	Yeah, that's a great answer and something I've heard a lot. My story, to be, actually, pretty brief, and it's certainly not as storied and as exciting as Ken's, but I kind of describe it as the story of the phoenix, albeit I'm not the magical bird rising from the ashes. I'm just a bird, but I used to be in real estate. And in 2008 and '09, well, we all know what happened there. And I managed to make it out alive so to speak, nothing too damaging, but decided to use this opportunity to make a career change. And got into the world of data through a company that was a customer of ours as a GIS specialist - excuse me - and worked my way through that role for a few years. That's actually when I learned about Alteryx itself. I started using Alteryx to do some spatial processing and whatnot. And after a couple years, an opportunity at Alteryx came up and I jumped at it and I've been here ever since. So about four-and-a-half years I've been at this company, but my background started in real estate. And through a series of, now I look at it as fortunate events, I have joined a world that's been fun, exciting, and challenging. And I get to meet people like Ken and have conversations with them on a regular basis, so [crosstalk].
BRIAN: 10:21	And now we're stuck with you.
GARTH: 10:22	I know. I know. I'm sorry about that, but it's good for me. Selfishly, I get to absorb all the knowledge from all of you people. And so maybe I'm greedy that way, but I'm enjoying it.
	[music]
BRIAN: 10:41	So I think what would be awesome to talk about next, I know that, Ken, you have this background in maybe what I would call doing, air quotes, "big data." Although, happy for you to maybe amend that or give us the definition as you see it. Would love to hear your thoughts on what is big data? What does it really mean? How should people be thinking about it? And then maybe a little bit from you on kind of best practices or thoughts about how people should go about using this in their organizations or beyond.
KEN: 11:14	Again, let's go back a little bit in time to about four or five years ago. When I first started using Alteryx, I had to learn it in the context of project work. I had two very intense projects which were given to me and I didn't have the luxury of learning Alteryx fundamentals. I had to learn on the fly and luckily for me, I had that programming background so things went pretty quickly. But in the first year of learning Alteryx, I realized that at the time, there weren't a lot of comprehensive examples of projects that I could go find, read about, and then replicate to learn Alteryx on my own. Now, things are very different, but back then, I basically decided to do a little study. My study was I wanted to do a full spectrum analytics project where anybody could do it. You could go out, get the data from the internet, process the data in Alteryx, visualize it in your favorite package, and then learn about the topic. And the topic that I chose was global climate. I wanted to know-- there'd been so much talk about global warming and climate change through the years that I wanted to know for myself what would happen if I went and got the data, did all this work, documented it, and then would I be able to draw conclusions from it? Because I'm not a climatologist or a meteorologist, but I am a geologist so I had a pretty good science background and pretty good computational background for understanding models and simulation frameworks and all this sort of thing. So what I did was in 2014, I wrote a five-part article series that I called my Alteryx manifesto, which is how do you go out, get this climate data? How do you process it? How do you visualize it?
KEN: 13:16	And so it's something like How You Build an Alteryx Workflow to Create Visualizations in Tableau. It's some title like that. I can't remember exactly. Well, when I did that, it took probably at least more than 100 hours to do. So it was quite an undertaking, but in doing it, I learned a lot about how Alteryx can handle large data volumes because the climate data goes back into the 1700s and there's billions of records. So if you're going to play in that arena, if you're going to be able to look at data from more than 100,000 monitoring stations over a couple hundred-- over a couple centuries, you've got to be pretty efficient with the way that you handle data. So that was probably my first exposure to handling big data in Alteryx. Since that time, a year or two went by and I went back and I wrote a whole nother series of articles about my observations on climate change. So I've been able to refine the approaches that I set up in the beginning to process this data very efficiently because if you don't process it efficiently, it'll take you forever to get through and then you're going to end up with way more data than you want to use. So a lot of that stuff is documented in my article series. And people do read it and I suspect that some people have learned a few things from it. On my regular job, I've been exposed to even larger volumes of data from a few different projects. And I'm not going to go into detail about these projects, but there's lots of streaming data that's coming my way. And over the past, say, year or so, I've been working on developing techniques to efficiently handle data in Alteryx that's coming through a big data environment, either coming through Hadoop or Oracle or whatever the situation may be.
KEN: 15:23	And I've been working on establishing these methods and writing and documenting them. So a lot of those techniques are available now if you have the patience to read the articles and follow the training videos that I've created. So I guess what I want to say to you, Brian, about this is that, to me, data is data. How big it is, how fast it's created, all of that stuff, a lot of that's buzzwords to me. What I'm trying to do is uncover the stories hidden within the data. So whether it's massive quantities of data or small quantities of data, my techniques are a lot alike. I use various things like aggregation, slicing and dicing in different ways, and I think a lot of that goes back to my computational framework. And that's why I spent the time to talk about it earlier where you're building these three and four-dimensional models and you can't see things in 3 and 4D that easily, so you have to slice and dice. And that's kind of what I do with business analytics, too, is I use those same approaches to handle large volumes of data. So in this one case that I'm talking about here where I'm receiving, say, 400 million records of data per month from millions of objects that are creating this data, one of the things I've realized is that if you compartmentalize that data into smaller subclasses of information, so let's say that you have 60 different types of equipment that create this data, if you store the data by those 60 different pieces, that's manageable and it makes it a lot easier to retrieve the information when you need it.
KEN: 17:19	So instead of storing everything in one gigantic file, if you break it down into smaller pieces, then you can retrieve what you need quickly, you can write workflows that are very efficient in extracting the information that you need when you need it. So a lot of the techniques I'm developing now, sort of what I consider to be kind of on the cutting edge of analytics research for dealing with big data, I'm doing that stuff in Alteryx. And of course, we have competing platforms that I could be using, but the reality is I've been able to do it very effectively with Alteryx so there's no really reason for me to try anything else.
GARTH: 18:00	I think that actually, Ken, you've made several great points. And a couple more maybe nuanced ones are the debate, which I think everyone seems to be narrowing in on kind of an agreement here, but the debate about what is big data? And a great book came out, I don't know, a couple years ago, I can't even remember, I read by a person named Viktor Mayer-Schönberger, I believe. It's called Big Data: A Revolution That Will Transform How We Live, Work, and Think. And made a point in the book about how big data is basically-- his definition was it's when the data is large for your processing environment, right? So nowadays, you're talking 400, 500 million records and you're talking about terabytes, maybe petabytes. I mean, that's what most people often think of as big data, but it really is what your environment can handle. And to Ken's second point about breaking things into small parts, small slices of value if you will, it makes the data-- it starts converting the data, if you will, into information - right? - and making it manageable at each step and really improving your workflow process. Specifically, how you're handling and prepping the data for any sort of analytic purpose is-- I believe the CRISP model is 80% of your effort if not more now. Just to get to the starting point takes 80% of your time and effort. So I've read, Ken, your series. It is rich with information. It's long. There's a lot of-- there's a lot to absorb there, but I highly recommend everyone take the time. You'll only be better as a result if you read your series.
KEN: 20:09	Thank you. One of the things that I wanted to say was that if we talk about the buzzword of big data and I go back in my career, I was already using billion line files in the '90s. We didn't call it big data back then. We just called it model input. And so we're running these really large models for doing things like designing locks and dams on the Mississippi River, finite element models where you have millions and millions of cells. And so you have these high-resolution models and you need big files that provide the input to these things. So back then in the scientific community, we weren't thinking about, "How big is this data?" It's just you did what you had to do to get the job done. In business, it seems like there's more of an emphasis on, "Oh, look at how much data we collect." Well, just because you collect it doesn't mean it's going to be perfectly suited for your analysis. Pretty much, I have a couple of mantras. Number one is all the data that I've ever been given, never once has it come in the right form for me to use it by itself exactly the first time. I always have to do something to data. And number two, I treat all data as guilty until proven innocent. So there's always data QA. There's always things that have to happen to data. And I think this is the role of Alteryx that makes it such a valuable tool because all of those things that have to happen can happen within the context of an Alteryx workflow. And for me, in the beginning, when I first started using Alteryx, I thought it was just a collection of tools which do these little, independent modifications to data like transpose the data or add a formula, just individual, little operations like I had done through all the years of working the hard way, as I call it.
KEN: 22:14	When you get to the understanding of what Alteryx really is, how it's a beautiful implementation of an object-oriented programming platform where all the tools work together, they work harmoniously, they're designed to support each other, I mean, you can run them independently, but you can build tremendously capable workflows for big data or small data. And you can do it so fast and they're so repeatable and so reliable that it's like you write these custom codes in a fraction of the time and then you can just run them over and over and over whenever you need. And so for me, one of the great breakthroughs was my comprehension of what Alteryx gives me in terms of a holistic platform from ingesting the data to modifying the data to writing it out in the form that I need it to go, whether it be quantitative or visual. Everything can be done there. That is where it's so transformative working with big data is that it's not only fast-- well, listen, things are fast for two reasons. Number one, you have fast hardware. Number two, you have optimized software. Alteryx is optimized, has been from the beginning, for great throughput. The reading and writing from hard drive, that's kind of a fixed entity. I mean, we're stuck with what we've got right now until faster hard drives come along and better technology. We can only read and write so much data. But the key things is that Alteryx has this great bandwidth with this great pipeline that allows the data to come in quick, be processed quick, and get out quick. And that's what makes it so special to me is that there's no limits being applied to what I do on the daily basis in Alteryx.
GARTH: 24:06	Yeah, and that's a good point. And you made a really good one, I think, early on that the value of data-- just to sum it up is the value of data is not in its bits and bytes but the information it contains, right? And so the processing and extracting that information is the value. And I agree with you. The fact that our tools have empowered-- it's the great enabler, in my opinion, our platform, because you have brilliant minds who just perhaps don't have a programming background, but that's not what's important. What's important is being able to derive the value and the insight from data. That being said, I also think of at least Designer as a great enabler because if you do hit the limits of the platform, which it's becoming harder and harder to hit those limits as we update with new and improved and more offerings and more features, you have the ability to step outside of the platform or step outside of our products and use tools that are better suited for the job at hand. And for example, you can write up a script and run it through a Run command tool and execute that script via Run command tool if necessary. And lastly, I mean, sometimes people are just comfortable with what they're comfortable with. And I feel like Alteryx can abstract away some of the annoying or painful work and some of the prep and blend. And then some of the more quantitative stuff if you're more comfortable with Python or R or some other language, execute it in that environment and then bring it back into Alteryx. So not to get too off-topic, but I agree with a lot of your points. And, yeah, want to just double down on what you're saying, Ken.
	[music]
BRIAN: 26:06	So Ken, you've talked a little bit about your blog and how important it's been to you. One thing that I think would be really interesting is to hear how has that helped you in your career? How has that helped you inform your opinions? And what have you learned from technical blogging?
KEN: 26:21	So when I launched my blog, it was really kind of a clandestine scientific experiment. I wanted to do a long-term blogging experiment. I couldn't find any information that somebody had actually done this before. So for two-and-a-half years, I ran this sort of process improvement-based blogging experiment without calling it that. And so I wrote about a lot of different techniques. I wrote, initially, about 170 articles to find out what people were interested in, what they didn't like, what were they responding to? How long did it take to get a readership? And so after two-and-a-half years, I just said, "Okay, done. End of experiment." I kind of explained the experiment and then talked about some of the conclusions in the epilogue. And then I decided, "Well, am I going to continue this or not?" Well, I continued it and the reason I continued blogging was that-- and this is getting back to the answer to your question is what have I learned? Number one is when you write a technical blog, it's a scary proposition because there's always people out there who are smarter than you, who are better than you, who can do things differently than you, more efficiently, and you sort of put your vulnerabilities out there on display. And so when I first started writing, I was writing in a vacuum. I was just popping these articles out. They were coming into my head. I was writing them. And it was basically based on all of those years of experience of me using Tableau and then eventually Alteryx to create these ideas and techniques. And so I decided about after 100 articles that I was changing from just writing a technique-based blog to a problem-solving blog. And the reason for that was in my process improvement work, the work that we do in analytics is to solve problems and to make things better.
KEN: 28:18	So my mission then changed from just writing about certain techniques in Tableau or Alteryx to how do you solve problems holistically? And so a lot of people have said, "Oh, your blog's too technical," or whatever. "It's too wordy." People have told me, "I don't read it because it's too long." The thing about it is is I've never had that opportunity to read something like that when I was learning so I wanted to give back by writing a very in-depth coverage of these topics, which are real time important topics. These are all of us collectively using tools like Alteryx, Tableau, and other things. We're all trying to get better. We're trying to make the world a better place. And so if I could share the things that I've learned in the context of a blog, then I thought that that was a good thing. And that's what I've been doing. And so eventually people find it, they read it, but it's kind of a thankless profession. I wouldn't call it a profession. It's a volunteer activity. So you just have to do it. It's not easy. It's not necessarily that much fun, but the most unexpected thing that happened was it made me better. I didn't expect it to but it did because it gives me a memory of what I've done, how I did it, why I did it. And I can look backwards in time through my history of my blog and I can see where I was back then and where I am now. And I can see my self-improvement and that's one of the best things about it.
BRIAN: 29:53	Yeah, and I can actually confirm for you now, live, here on the podcast, that creating content and putting it out for the people is a career for both Garth and I. So I understand what you're saying though about the thankless comment you made. I think it's more you hear from people that have feedback, some of it critical, some of it positive. But what you don't hear and I think this goes back to a point Garth had made earlier, you don't always hear from the people who wandered by your blog or your community or in Garth's case, the content he's creating around Alteryx products, you don't always hear from the folks that click into it, get a tremendous amount of value, and then they put it into action and go about their day, right? And I think that's the thing you got to get comfortable with is someone that's putting out content is you're not always going to hear from the people who get the most value out of it. And you just have to trust the process that, "Hey, I know what I'm putting out is good and I think other people are going to get a lot of value out of it," and that's all that you can do. And I've read a lot of your blogs. I'm not as smart as you or a lot of people probably that read it, so the value that I get is a little bit probably different than other folks, but I think it's obviously a tremendous resource. And for those listening to the show here, we'll put the links, obviously, in the show notes over at community.alteryx.com/podcast.
	[music]
BRIAN: 31:26	One thing I'd like to go back to for just a moment, I was thinking as you were talking through earlier, Ken, your background in scientific arenas and then transitioning into kind of the business world and then going back and forth with your blog and so forth, when I think about the term data scientist - and this is another term we're hearing a lot data science, data scientist - the question I have for you is when I think of you as a "data scientist," I guess earlier on in your career, you were a scientist that used data, right? And now, in the business world, maybe you've transitioned to a data worker that's applying that science to get to the end goal or the answer or the insight. I'm just curious if there's some type of crossover or correlation or how do you see those two kind of? Are they different? Are they the same or do you wrap them all into one process? I'm just kind of interested to explore that further.
KEN: 32:28	I wrote an article a few weeks ago-- a couple months ago called something like Data Science: Why Data Must Come Before the Science. And it doesn't matter what advanced technique you're using, it could be TensorFlow from Google, it could be whatever machine learning algorithm, if you don't get the data right before you give to this platform, then what you're going to get out of it is noise and not very useful. So really, I think data comprehension and learning how to handle data properly is the most important thing young workers need to work on earlier in their career. And if they get the data part right, if they can use a tool like Alteryx to really understand how to manipulate and control data, then that's going to set them up for more advanced stuff in their life so that they can become "the data scientist" that does machine learning or the data scientist that does logistic regression modeling, whatever it may be, whatever their forte is. The main thing is they've got to work with the data. And that's one of the biggest deficiencies that I see in incoming workers is they don't know how to handle the data, they don't know how to manipulate it, they can't corral it. And that's where the role of Alteryx is in my opinion.
BRIAN: 33:48	Yeah, and we're hearing this a lot right now where companies are coming and telling us, "Hey, we have all this need" - right? - "for data scientists and data analysts, but there's a shortage where there's just not enough. We can't hire them fast enough." As an organization, we're attempting to insert Alteryx into university programs. Of course, we have the Udacity Nanodegree for business analysts. We're trying to create other learning opportunities through the community with Alteryx Academy, but I think you're making an interesting point which is maybe the best way is just to have seat time, right? You're there with some data and applying it in a real-world scenario. I mean, how would you weight it? Do you think that the instructional piece is good to augment or is it both? Is it one or the other? What kind of mix-- if you were going to bring someone up today, how would you set them off on a path to get where you think they need to go?
KEN: 34:52	Good question. Typically, what I've done with the incoming workers that have worked with me is I give them a task and I don't give them Alteryx initially. I just say, "Here's the task. This is what I need you to do. Go do it. Use what you learned in school to do it." And then they'll go do it. They'll struggle through it for a day or two or three and they'll come back and say, "Hey, look, this is what I did." And then I'll sit down with them and I'll do it in Alteryx in about 20 minutes or 10 minutes or whatever the task may be. And they're just astounded because that's the best way for me to teach them is I make them do it the hard way, then I show them the easy way. And when you begin to see those things over and over and over and you learn that the reason that Alteryx exists is because it makes everything easier, then you want to use a tool like that because it makes you very productive. You're not spinning your wheels for three days trying to do something in SQL or whatever technique they had to do. So with the people that I've tried to influence, I've given them those kinds of tasks but I've also given them-- basically, when I find interesting blogs or I find interesting stories, I tell them, "Go read this one because this is important," or on the job, when a job comes to me that I can handle, I send it out to them and say, "Try to learn this tool to do this." So it's really getting hands dirty with the data, Brian. That's really the most important thing. I mean, you can sit back and read and study, but when you work with the data and you apply the techniques, that's when you learn them the best.
GARTH: 36:35	Yeah, good point. So if I were going to maybe answer your question, Brian, about what does a data scientist represent or what is a data scientist, who is, people from my team and who've been on my team in the past, they may recall that I have a pretty rigid definition or expectation of data scientists. And I think it all speaks to Ken's point that first and foremost, it's time in the seat, right? it's time doing the thing. So back in, yeah, July of 2013, I'm going to botch this name, but an individual named Swami Chandrasekaran, I believe, I can put it in the show notes, but wrote this blog about becoming a data scientist and actually created an infograph called the Curriculum via Metromap. And it pretty much just outlines if you really want to be a data scientist, in this individual's mind who I happen to agree with, these are the things that you need to be at the very least proficient at and it covers the gambit. And kind of to double down on Ken's point, starting with the fundamentals, I'll use - excuse me - a common analogy or a comment you'll hear when people say, "Hey, I want to be a front end developer. I want to use Angular," or, "I want to use React," or, "I want to use whatever framework is the hotness these days." And the most pragmatic experts in the field that I follow and listen to, they say, "Well, learn plain vanilla JavaScript first. Don't use the framework that abstracts away all of the knowledge that has been gleaned over the years," right? "Understand what you're doing and why you're doing it and then you can make the decision on the tools that you want to use" - right? - "to make your life easier."
GARTH: 38:31	In a way, Alteryx is just another abstraction library - right? - to solve problems that people have been trying and Ken has been solving for years and years, right? But a data scientist really is the person that has the fundamentals down, a quantitatively minded individual, understands visualizations and communicating information, as well as up-to-date with common machine learning methods and practices and has a fairly broad toolbox, if you will - right? - comfortable in a lot of different proprietary or opensource platforms and is really for the most part, I would say, maybe at a mastery level of a lot of those different areas. And to be honest, there's nothing wrong with not being a data scientist. You can be a quantitative programmer, a mathematical programmer, a data analyst, a data engineer. You can be all these other things that are really, really complicated, sophisticated roles. But I think we use the term data scientist a little too loosely now and it sort of muddies the water of the work and the skills and the tools that really, in my mind, comprise what a data scientist is.
	[music]
BRIAN: 39:59	Okay, so last but not least, let's talk about our community picks. So what have you found recently in the Alteryx community that you find interesting or you'd like other people to take a look at?
KEN: 40:08	The weekly challenges, for me, have been not something that I do. I just haven't had the time to do them, but a few months ago I was going into a coding competition at work and so I took some time to go through the history of the weekly challenges. And when I did that, I realized that there's a lot of examples in there where there's these beautiful little techniques that are stored in these different solution techniques from people who have solved each of these different challenges. And so I wrote about that in a post called The Finer Things in Alteryx. And I think for people who are new to Alteryx who are looking for inspiration or looking for rapid learnings, they could go to the weekly challenges and spend some dedicated time, an hour or two, and go through 10, 20 examples and sit back and just look at what people have done, look at the way that they've attacked the problem, and learn from that. And so what I did was going into my top coder challenge, I wrote this piece. I extracted all of these techniques and then I had them at my fingertips ready to use when I went into the coding challenge. So for me, the weekly challenges have a lot of value.
GARTH: 41:29	My pick is there was a thread started all the way back in November 2015 by Ada Perez. I always mess the name up, but the thread is available big data sets over the internet. And this may not be common knowledge yet, but one of the commenters in this just ongoing thread, it's still pretty active, LordNeilLord is the user name, posted that FiveThirtyEight actually opened up all of their data sets, posted the link in there. So now, you get to go-- you can go to FiveThirtyEight, actually grab the data set that they wrote an article about, and if you want and you have the time and energy, you can try and replicate the results that they've uncovered. It's kind of fun, pretty cool. So that's my pick.
BRIAN: 42:26	And my community pick, thinking through this episode and some of the things we talked about, there's a real focus from the community team this year to think about how do we service as many different personas as possible? We've done a great job over the years, to Ken's point, with the weekly challenge and thinking about how do we level up people's skills? And I think we started primarily with beginning users in mind. And now, we're ready to start thinking about folks that are more of, let's say, a developer type of persona. So we recently opened a developer area on the community and it's just getting started, but it's going to be a great resource as we build it out for folks that are interested in using SDKs and APIs and all those kinds of things to both get documentation, get code samples, talk with their peers, all of that kind of good stuff. So I would highly recommend if you're of that bent to check out the developer area of Alteryx community. All right, well, I think that wraps up for this episode. Thank you guys so much for being on. This was fantastic.
GARTH: 43:32	Thanks, Brian.
KEN: 43:33	Thank you, Brian.
GARTH: 43:34	Had a blast.

BRIAN: 00:06 Welcome to Alter Everything, a podcast about data and analytics culture. I'm Brian Oblinger and I'll be your host. We're joined today by Ken Black and Garth Miles to discuss big data, the process of creating great content, and the data scientist's role. Let's get right into it. [music] BRIAN: 00:28 Gentlemen, welcome to the show. KEN: 00:30 Thank you, Brian. GARTH: 00:31 Excited. Thanks, Brian. BRIAN: 00:33 Yeah, absolutely. It's wonderful to have you guys on. This is going to be episode two of the podcast, so it's just going to be great. Where I would like to start, as always, is with some intros. So Ken, let's go ahead and start with you. KEN: 00:46 Hi. My name's Ken Black. I work as a data scientist in the automotive industry. I've been doing computational science, various types, for over 30 years now. And in this podcast, I hope to tell a story about that and where I see modern analytics, where I see it now and where I see it changing in the future. GARTH: 01:10 Hi. I'm Garth Miles. I'm an engineering manager here at Alteryx and I'm really excited about this episode. I've been watching and following Ken for many years now and can't wait to be a part of this experience. BRIAN: 01:27 So one of the things I'd like to dig in first here is let's talk a little bit about how each of you came to analytics because I think as we're talking to more folks along the way here, we're realizing that one of the really interesting things about the industry is that people have really different stories about where they came from, what they were doing before analytics, and how they found themselves here. So Ken, I'd love to kick off with you. I know you have a great story about that. KEN: 01:55 Well, sure. Thank you, Brian. For my story to begin, we have to go way back in time to the mid-1980s when I was doing my master's degree. I did both a bachelor's and master's in geology with an emphasis on mathematics and computer science. So back in 1985 to '87 time frame, my master's thesis was called A Microcomputer Groundwater Data Analysis Program. What that really meant was that I combined data analytics with simulation, visualization, predictive analytics to do things related to groundwater remediation. So back then, of course, we had slow computers. We had very little memory. So a lot of the things that we were doing were limited in the computational realm by the hardware that we had at the time. Certainly, even back then, hardware was ahead of software but not by much and so we had to do a lot of things the hard way. So when I launched my career then, for the first 20 years, I took that experience and pretty much was working on groundwater remediation studies around the country. So computational models we were building were three and four-dimensional models that looked at how do you remediate contaminated groundwater? The purpose of that work was primarily environmental remediation for cleaning up groundwater supplies, but it was more than that. It was projects like the Exxon Valdez oil spill in Alaska, things like cleaning up groundwater in Cape Cod. That was a project that spanned 10 years. Another big project to work on was the environmental restoration of the Everglades. So these are major environmental projects, but there were also a lot of little ones where we were looking at cleaning up water supplies and national laboratories, places where we had radionuclide transport, where we had significant environmental problems. KEN: 04:01 So all along that pathway of doing that work, I did a lot of work in writing computational codes in languages like Fortran and C and C++, graphical processors to be able to take the data that we were calculating in the models and visualize it. So creating animations, creating 2D and 3D plots of contamination, and then, a lot of quantitative post-processing of the data to calculate things like contaminant migration, velocities, contaminant removal quantities, volumetric fluxes of groundwater, things like this. So for 20 years, I did a lot of computational science. And after 20 years of that, it was getting to the point where I thought to myself, "I'm really good at this and I could continue to do this, but I want to try to do something else." And at the time, I was coaching basketball and one of my friends-- one of the boys on my team, his dad had a company that did process improvement work. And so in about 2007, about 11 years ago I guess now, I switched over from environmental science to process improvement. And during that time, I got exposed to a lot of different types of businesses, a lot of different types of data coming from the businesses where we were focusing on trying to improve processes. The industries included medical manufacturing and transportation, healthcare, just a variety of things where processes needed to be improved. So part of what I did there was I rewrote the company's software. I reverse engineered it to bring it up to contemporary standards. And then that kind of began my journey into really kind of pure business analytics. KEN: 06:01 And so when you get training in statistical process control theory and you write software to do that, there's a lot of learning to be had. So I went from this scientific background to this very applied statistical testing methodology that we used. And then eventually, after about seven or eight years of doing that, I switched over to the automotive industry. So that's sort of the history of how I got to where I am now. But before I conclude that, let me just say that really in about 2008, I started doing a lot of visual analytics work using Tableau. And then about four-and-a-half, five years ago, I got Alteryx. So in the first five years of using Tableau, I took on more and more challenging projects and I got to the point where I couldn't really solve them exactly in Tableau. I had to do a lot of custom coding to augment Tableau's capability. But then once I got Alteryx, all of the custom coding went away and all the data prep went into Alteryx to drive all the work that I do. So everything that I do now goes directly into Alteryx. All the data prep and manipulation happens there and then results get kicked out for visual analytics. So that's sort of how I got to where I am. BRIAN: 07:24 Wow. Looking forward to you trying to top that one, Garth. GARTH: 07:28 Yeah, well, I have a followup question before I even try. But Ken, can you tell me a little bit about what languages you were using pre Altryx in your coding efforts? KEN: 07:44 Sure. A lot of the work that I did-- I say that I can program in 10 different languages and people hear me say that and they're thinking I'm talking about spoken languages but they're actually computer languages. So my master's was done in Turbo Pascal, a Borland product way back when, so I became proficient in Pascal, Fortran, C, C++, of course, HTML, XML, CSS, all of the web-based technologies, a little bit of C#, a little bit of Python. But basically, since I have been doing Alteryx, I just pushed aside almost all the programming and it's all done in Alteryx. So why should I spend my time writing custom codes when Alteryx does everything for me now? That's really where I am. GARTH: 08:38 Yeah, that's a great answer and something I've heard a lot. My story, to be, actually, pretty brief, and it's certainly not as storied and as exciting as Ken's, but I kind of describe it as the story of the phoenix, albeit I'm not the magical bird rising from the ashes. I'm just a bird, but I used to be in real estate. And in 2008 and '09, well, we all know what happened there. And I managed to make it out alive so to speak, nothing too damaging, but decided to use this opportunity to make a career change. And got into the world of data through a company that was a customer of ours as a GIS specialist - excuse me - and worked my way through that role for a few years. That's actually when I learned about Alteryx itself. I started using Alteryx to do some spatial processing and whatnot. And after a couple years, an opportunity at Alteryx came up and I jumped at it and I've been here ever since. So about four-and-a-half years I've been at this company, but my background started in real estate. And through a series of, now I look at it as fortunate events, I have joined a world that's been fun, exciting, and challenging. And I get to meet people like Ken and have conversations with them on a regular basis, so [crosstalk]. BRIAN: 10:21 And now we're stuck with you. GARTH: 10:22 I know. I know. I'm sorry about that, but it's good for me. Selfishly, I get to absorb all the knowledge from all of you people. And so maybe I'm greedy that way, but I'm enjoying it. [music] BRIAN: 10:41 So I think what would be awesome to talk about next, I know that, Ken, you have this background in maybe what I would call doing, air quotes, "big data." Although, happy for you to maybe amend that or give us the definition as you see it. Would love to hear your thoughts on what is big data? What does it really mean? How should people be thinking about it? And then maybe a little bit from you on kind of best practices or thoughts about how people should go about using this in their organizations or beyond. KEN: 11:14 Again, let's go back a little bit in time to about four or five years ago. When I first started using Alteryx, I had to learn it in the context of project work. I had two very intense projects which were given to me and I didn't have the luxury of learning Alteryx fundamentals. I had to learn on the fly and luckily for me, I had that programming background so things went pretty quickly. But in the first year of learning Alteryx, I realized that at the time, there weren't a lot of comprehensive examples of projects that I could go find, read about, and then replicate to learn Alteryx on my own. Now, things are very different, but back then, I basically decided to do a little study. My study was I wanted to do a full spectrum analytics project where anybody could do it. You could go out, get the data from the internet, process the data in Alteryx, visualize it in your favorite package, and then learn about the topic. And the topic that I chose was global climate. I wanted to know-- there'd been so much talk about global warming and climate change through the years that I wanted to know for myself what would happen if I went and got the data, did all this work, documented it, and then would I be able to draw conclusions from it? Because I'm not a climatologist or a meteorologist, but I am a geologist so I had a pretty good science background and pretty good computational background for understanding models and simulation frameworks and all this sort of thing. So what I did was in 2014, I wrote a five-part article series that I called my Alteryx manifesto, which is how do you go out, get this climate data? How do you process it? How do you visualize it? KEN: 13:16 And so it's something like How You Build an Alteryx Workflow to Create Visualizations in Tableau. It's some title like that. I can't remember exactly. Well, when I did that, it took probably at least more than 100 hours to do. So it was quite an undertaking, but in doing it, I learned a lot about how Alteryx can handle large data volumes because the climate data goes back into the 1700s and there's billions of records. So if you're going to play in that arena, if you're going to be able to look at data from more than 100,000 monitoring stations over a couple hundred-- over a couple centuries, you've got to be pretty efficient with the way that you handle data. So that was probably my first exposure to handling big data in Alteryx. Since that time, a year or two went by and I went back and I wrote a whole nother series of articles about my observations on climate change. So I've been able to refine the approaches that I set up in the beginning to process this data very efficiently because if you don't process it efficiently, it'll take you forever to get through and then you're going to end up with way more data than you want to use. So a lot of that stuff is documented in my article series. And people do read it and I suspect that some people have learned a few things from it. On my regular job, I've been exposed to even larger volumes of data from a few different projects. And I'm not going to go into detail about these projects, but there's lots of streaming data that's coming my way. And over the past, say, year or so, I've been working on developing techniques to efficiently handle data in Alteryx that's coming through a big data environment, either coming through Hadoop or Oracle or whatever the situation may be. KEN: 15:23 And I've been working on establishing these methods and writing and documenting them. So a lot of those techniques are available now if you have the patience to read the articles and follow the training videos that I've created. So I guess what I want to say to you, Brian, about this is that, to me, data is data. How big it is, how fast it's created, all of that stuff, a lot of that's buzzwords to me. What I'm trying to do is uncover the stories hidden within the data. So whether it's massive quantities of data or small quantities of data, my techniques are a lot alike. I use various things like aggregation, slicing and dicing in different ways, and I think a lot of that goes back to my computational framework. And that's why I spent the time to talk about it earlier where you're building these three and four-dimensional models and you can't see things in 3 and 4D that easily, so you have to slice and dice. And that's kind of what I do with business analytics, too, is I use those same approaches to handle large volumes of data. So in this one case that I'm talking about here where I'm receiving, say, 400 million records of data per month from millions of objects that are creating this data, one of the things I've realized is that if you compartmentalize that data into smaller subclasses of information, so let's say that you have 60 different types of equipment that create this data, if you store the data by those 60 different pieces, that's manageable and it makes it a lot easier to retrieve the information when you need it. KEN: 17:19 So instead of storing everything in one gigantic file, if you break it down into smaller pieces, then you can retrieve what you need quickly, you can write workflows that are very efficient in extracting the information that you need when you need it. So a lot of the techniques I'm developing now, sort of what I consider to be kind of on the cutting edge of analytics research for dealing with big data, I'm doing that stuff in Alteryx. And of course, we have competing platforms that I could be using, but the reality is I've been able to do it very effectively with Alteryx so there's no really reason for me to try anything else. GARTH: 18:00 I think that actually, Ken, you've made several great points. And a couple more maybe nuanced ones are the debate, which I think everyone seems to be narrowing in on kind of an agreement here, but the debate about what is big data? And a great book came out, I don't know, a couple years ago, I can't even remember, I read by a person named Viktor Mayer-Schönberger, I believe. It's called Big Data: A Revolution That Will Transform How We Live, Work, and Think. And made a point in the book about how big data is basically-- his definition was it's when the data is large for your processing environment, right? So nowadays, you're talking 400, 500 million records and you're talking about terabytes, maybe petabytes. I mean, that's what most people often think of as big data, but it really is what your environment can handle. And to Ken's second point about breaking things into small parts, small slices of value if you will, it makes the data-- it starts converting the data, if you will, into information - right? - and making it manageable at each step and really improving your workflow process. Specifically, how you're handling and prepping the data for any sort of analytic purpose is-- I believe the CRISP model is 80% of your effort if not more now. Just to get to the starting point takes 80% of your time and effort. So I've read, Ken, your series. It is rich with information. It's long. There's a lot of-- there's a lot to absorb there, but I highly recommend everyone take the time. You'll only be better as a result if you read your series. KEN: 20:09 Thank you. One of the things that I wanted to say was that if we talk about the buzzword of big data and I go back in my career, I was already using billion line files in the '90s. We didn't call it big data back then. We just called it model input. And so we're running these really large models for doing things like designing locks and dams on the Mississippi River, finite element models where you have millions and millions of cells. And so you have these high-resolution models and you need big files that provide the input to these things. So back then in the scientific community, we weren't thinking about, "How big is this data?" It's just you did what you had to do to get the job done. In business, it seems like there's more of an emphasis on, "Oh, look at how much data we collect." Well, just because you collect it doesn't mean it's going to be perfectly suited for your analysis. Pretty much, I have a couple of mantras. Number one is all the data that I've ever been given, never once has it come in the right form for me to use it by itself exactly the first time. I always have to do something to data. And number two, I treat all data as guilty until proven innocent. So there's always data QA. There's always things that have to happen to data. And I think this is the role of Alteryx that makes it such a valuable tool because all of those things that have to happen can happen within the context of an Alteryx workflow. And for me, in the beginning, when I first started using Alteryx, I thought it was just a collection of tools which do these little, independent modifications to data like transpose the data or add a formula, just individual, little operations like I had done through all the years of working the hard way, as I call it. KEN: 22:14 When you get to the understanding of what Alteryx really is, how it's a beautiful implementation of an object-oriented programming platform where all the tools work together, they work harmoniously, they're designed to support each other, I mean, you can run them independently, but you can build tremendously capable workflows for big data or small data. And you can do it so fast and they're so repeatable and so reliable that it's like you write these custom codes in a fraction of the time and then you can just run them over and over and over whenever you need. And so for me, one of the great breakthroughs was my comprehension of what Alteryx gives me in terms of a holistic platform from ingesting the data to modifying the data to writing it out in the form that I need it to go, whether it be quantitative or visual. Everything can be done there. That is where it's so transformative working with big data is that it's not only fast-- well, listen, things are fast for two reasons. Number one, you have fast hardware. Number two, you have optimized software. Alteryx is optimized, has been from the beginning, for great throughput. The reading and writing from hard drive, that's kind of a fixed entity. I mean, we're stuck with what we've got right now until faster hard drives come along and better technology. We can only read and write so much data. But the key things is that Alteryx has this great bandwidth with this great pipeline that allows the data to come in quick, be processed quick, and get out quick. And that's what makes it so special to me is that there's no limits being applied to what I do on the daily basis in Alteryx. GARTH: 24:06 Yeah, and that's a good point. And you made a really good one, I think, early on that the value of data-- just to sum it up is the value of data is not in its bits and bytes but the information it contains, right? And so the processing and extracting that information is the value. And I agree with you. The fact that our tools have empowered-- it's the great enabler, in my opinion, our platform, because you have brilliant minds who just perhaps don't have a programming background, but that's not what's important. What's important is being able to derive the value and the insight from data. That being said, I also think of at least Designer as a great enabler because if you do hit the limits of the platform, which it's becoming harder and harder to hit those limits as we update with new and improved and more offerings and more features, you have the ability to step outside of the platform or step outside of our products and use tools that are better suited for the job at hand. And for example, you can write up a script and run it through a Run command tool and execute that script via Run command tool if necessary. And lastly, I mean, sometimes people are just comfortable with what they're comfortable with. And I feel like Alteryx can abstract away some of the annoying or painful work and some of the prep and blend. And then some of the more quantitative stuff if you're more comfortable with Python or R or some other language, execute it in that environment and then bring it back into Alteryx. So not to get too off-topic, but I agree with a lot of your points. And, yeah, want to just double down on what you're saying, Ken. [music] BRIAN: 26:06 So Ken, you've talked a little bit about your blog and how important it's been to you. One thing that I think would be really interesting is to hear how has that helped you in your career? How has that helped you inform your opinions? And what have you learned from technical blogging? KEN: 26:21 So when I launched my blog, it was really kind of a clandestine scientific experiment. I wanted to do a long-term blogging experiment. I couldn't find any information that somebody had actually done this before. So for two-and-a-half years, I ran this sort of process improvement-based blogging experiment without calling it that. And so I wrote about a lot of different techniques. I wrote, initially, about 170 articles to find out what people were interested in, what they didn't like, what were they responding to? How long did it take to get a readership? And so after two-and-a-half years, I just said, "Okay, done. End of experiment." I kind of explained the experiment and then talked about some of the conclusions in the epilogue. And then I decided, "Well, am I going to continue this or not?" Well, I continued it and the reason I continued blogging was that-- and this is getting back to the answer to your question is what have I learned? Number one is when you write a technical blog, it's a scary proposition because there's always people out there who are smarter than you, who are better than you, who can do things differently than you, more efficiently, and you sort of put your vulnerabilities out there on display. And so when I first started writing, I was writing in a vacuum. I was just popping these articles out. They were coming into my head. I was writing them. And it was basically based on all of those years of experience of me using Tableau and then eventually Alteryx to create these ideas and techniques. And so I decided about after 100 articles that I was changing from just writing a technique-based blog to a problem-solving blog. And the reason for that was in my process improvement work, the work that we do in analytics is to solve problems and to make things better. KEN: 28:18 So my mission then changed from just writing about certain techniques in Tableau or Alteryx to how do you solve problems holistically? And so a lot of people have said, "Oh, your blog's too technical," or whatever. "It's too wordy." People have told me, "I don't read it because it's too long." The thing about it is is I've never had that opportunity to read something like that when I was learning so I wanted to give back by writing a very in-depth coverage of these topics, which are real time important topics. These are all of us collectively using tools like Alteryx, Tableau, and other things. We're all trying to get better. We're trying to make the world a better place. And so if I could share the things that I've learned in the context of a blog, then I thought that that was a good thing. And that's what I've been doing. And so eventually people find it, they read it, but it's kind of a thankless profession. I wouldn't call it a profession. It's a volunteer activity. So you just have to do it. It's not easy. It's not necessarily that much fun, but the most unexpected thing that happened was it made me better. I didn't expect it to but it did because it gives me a memory of what I've done, how I did it, why I did it. And I can look backwards in time through my history of my blog and I can see where I was back then and where I am now. And I can see my self-improvement and that's one of the best things about it. BRIAN: 29:53 Yeah, and I can actually confirm for you now, live, here on the podcast, that creating content and putting it out for the people is a career for both Garth and I. So I understand what you're saying though about the thankless comment you made. I think it's more you hear from people that have feedback, some of it critical, some of it positive. But what you don't hear and I think this goes back to a point Garth had made earlier, you don't always hear from the people who wandered by your blog or your community or in Garth's case, the content he's creating around Alteryx products, you don't always hear from the folks that click into it, get a tremendous amount of value, and then they put it into action and go about their day, right? And I think that's the thing you got to get comfortable with is someone that's putting out content is you're not always going to hear from the people who get the most value out of it. And you just have to trust the process that, "Hey, I know what I'm putting out is good and I think other people are going to get a lot of value out of it," and that's all that you can do. And I've read a lot of your blogs. I'm not as smart as you or a lot of people probably that read it, so the value that I get is a little bit probably different than other folks, but I think it's obviously a tremendous resource. And for those listening to the show here, we'll put the links, obviously, in the show notes over at community.alteryx.com/podcast. [music] BRIAN: 31:26 One thing I'd like to go back to for just a moment, I was thinking as you were talking through earlier, Ken, your background in scientific arenas and then transitioning into kind of the business world and then going back and forth with your blog and so forth, when I think about the term data scientist - and this is another term we're hearing a lot data science, data scientist - the question I have for you is when I think of you as a "data scientist," I guess earlier on in your career, you were a scientist that used data, right? And now, in the business world, maybe you've transitioned to a data worker that's applying that science to get to the end goal or the answer or the insight. I'm just curious if there's some type of crossover or correlation or how do you see those two kind of? Are they different? Are they the same or do you wrap them all into one process? I'm just kind of interested to explore that further. KEN: 32:28 I wrote an article a few weeks ago-- a couple months ago called something like Data Science: Why Data Must Come Before the Science. And it doesn't matter what advanced technique you're using, it could be TensorFlow from Google, it could be whatever machine learning algorithm, if you don't get the data right before you give to this platform, then what you're going to get out of it is noise and not very useful. So really, I think data comprehension and learning how to handle data properly is the most important thing young workers need to work on earlier in their career. And if they get the data part right, if they can use a tool like Alteryx to really understand how to manipulate and control data, then that's going to set them up for more advanced stuff in their life so that they can become "the data scientist" that does machine learning or the data scientist that does logistic regression modeling, whatever it may be, whatever their forte is. The main thing is they've got to work with the data. And that's one of the biggest deficiencies that I see in incoming workers is they don't know how to handle the data, they don't know how to manipulate it, they can't corral it. And that's where the role of Alteryx is in my opinion. BRIAN: 33:48 Yeah, and we're hearing this a lot right now where companies are coming and telling us, "Hey, we have all this need" - right? - "for data scientists and data analysts, but there's a shortage where there's just not enough. We can't hire them fast enough." As an organization, we're attempting to insert Alteryx into university programs. Of course, we have the Udacity Nanodegree for business analysts. We're trying to create other learning opportunities through the community with Alteryx Academy, but I think you're making an interesting point which is maybe the best way is just to have seat time, right? You're there with some data and applying it in a real-world scenario. I mean, how would you weight it? Do you think that the instructional piece is good to augment or is it both? Is it one or the other? What kind of mix-- if you were going to bring someone up today, how would you set them off on a path to get where you think they need to go? KEN: 34:52 Good question. Typically, what I've done with the incoming workers that have worked with me is I give them a task and I don't give them Alteryx initially. I just say, "Here's the task. This is what I need you to do. Go do it. Use what you learned in school to do it." And then they'll go do it. They'll struggle through it for a day or two or three and they'll come back and say, "Hey, look, this is what I did." And then I'll sit down with them and I'll do it in Alteryx in about 20 minutes or 10 minutes or whatever the task may be. And they're just astounded because that's the best way for me to teach them is I make them do it the hard way, then I show them the easy way. And when you begin to see those things over and over and over and you learn that the reason that Alteryx exists is because it makes everything easier, then you want to use a tool like that because it makes you very productive. You're not spinning your wheels for three days trying to do something in SQL or whatever technique they had to do. So with the people that I've tried to influence, I've given them those kinds of tasks but I've also given them-- basically, when I find interesting blogs or I find interesting stories, I tell them, "Go read this one because this is important," or on the job, when a job comes to me that I can handle, I send it out to them and say, "Try to learn this tool to do this." So it's really getting hands dirty with the data, Brian. That's really the most important thing. I mean, you can sit back and read and study, but when you work with the data and you apply the techniques, that's when you learn them the best. GARTH: 36:35 Yeah, good point. So if I were going to maybe answer your question, Brian, about what does a data scientist represent or what is a data scientist, who is, people from my team and who've been on my team in the past, they may recall that I have a pretty rigid definition or expectation of data scientists. And I think it all speaks to Ken's point that first and foremost, it's time in the seat, right? it's time doing the thing. So back in, yeah, July of 2013, I'm going to botch this name, but an individual named Swami Chandrasekaran, I believe, I can put it in the show notes, but wrote this blog about becoming a data scientist and actually created an infograph called the Curriculum via Metromap. And it pretty much just outlines if you really want to be a data scientist, in this individual's mind who I happen to agree with, these are the things that you need to be at the very least proficient at and it covers the gambit. And kind of to double down on Ken's point, starting with the fundamentals, I'll use - excuse me - a common analogy or a comment you'll hear when people say, "Hey, I want to be a front end developer. I want to use Angular," or, "I want to use React," or, "I want to use whatever framework is the hotness these days." And the most pragmatic experts in the field that I follow and listen to, they say, "Well, learn plain vanilla JavaScript first. Don't use the framework that abstracts away all of the knowledge that has been gleaned over the years," right? "Understand what you're doing and why you're doing it and then you can make the decision on the tools that you want to use" - right? - "to make your life easier." GARTH: 38:31 In a way, Alteryx is just another abstraction library - right? - to solve problems that people have been trying and Ken has been solving for years and years, right? But a data scientist really is the person that has the fundamentals down, a quantitatively minded individual, understands visualizations and communicating information, as well as up-to-date with common machine learning methods and practices and has a fairly broad toolbox, if you will - right? - comfortable in a lot of different proprietary or opensource platforms and is really for the most part, I would say, maybe at a mastery level of a lot of those different areas. And to be honest, there's nothing wrong with not being a data scientist. You can be a quantitative programmer, a mathematical programmer, a data analyst, a data engineer. You can be all these other things that are really, really complicated, sophisticated roles. But I think we use the term data scientist a little too loosely now and it sort of muddies the water of the work and the skills and the tools that really, in my mind, comprise what a data scientist is. [music] BRIAN: 39:59 Okay, so last but not least, let's talk about our community picks. So what have you found recently in the Alteryx community that you find interesting or you'd like other people to take a look at? KEN: 40:08 The weekly challenges, for me, have been not something that I do. I just haven't had the time to do them, but a few months ago I was going into a coding competition at work and so I took some time to go through the history of the weekly challenges. And when I did that, I realized that there's a lot of examples in there where there's these beautiful little techniques that are stored in these different solution techniques from people who have solved each of these different challenges. And so I wrote about that in a post called The Finer Things in Alteryx. And I think for people who are new to Alteryx who are looking for inspiration or looking for rapid learnings, they could go to the weekly challenges and spend some dedicated time, an hour or two, and go through 10, 20 examples and sit back and just look at what people have done, look at the way that they've attacked the problem, and learn from that. And so what I did was going into my top coder challenge, I wrote this piece. I extracted all of these techniques and then I had them at my fingertips ready to use when I went into the coding challenge. So for me, the weekly challenges have a lot of value. GARTH: 41:29 My pick is there was a thread started all the way back in November 2015 by Ada Perez. I always mess the name up, but the thread is available big data sets over the internet. And this may not be common knowledge yet, but one of the commenters in this just ongoing thread, it's still pretty active, LordNeilLord is the user name, posted that FiveThirtyEight actually opened up all of their data sets, posted the link in there. So now, you get to go-- you can go to FiveThirtyEight, actually grab the data set that they wrote an article about, and if you want and you have the time and energy, you can try and replicate the results that they've uncovered. It's kind of fun, pretty cool. So that's my pick. BRIAN: 42:26 And my community pick, thinking through this episode and some of the things we talked about, there's a real focus from the community team this year to think about how do we service as many different personas as possible? We've done a great job over the years, to Ken's point, with the weekly challenge and thinking about how do we level up people's skills? And I think we started primarily with beginning users in mind. And now, we're ready to start thinking about folks that are more of, let's say, a developer type of persona. So we recently opened a developer area on the community and it's just getting started, but it's going to be a great resource as we build it out for folks that are interested in using SDKs and APIs and all those kinds of things to both get documentation, get code samples, talk with their peers, all of that kind of good stuff. So I would highly recommend if you're of that bent to check out the developer area of Alteryx community. All right, well, I think that wraps up for this episode. Thank you guys so much for being on. This was fantastic. GARTH: 43:32 Thanks, Brian. KEN: 43:33 Thank you, Brian. GARTH: 43:34 Had a blast.

Alter Everything Podcast

Podcast Guide

2: All data is guilty until proven innocent

Panelists

Topics

Community Picks

Transcript