For a full list of episodes, guests, and topics, check out our episode guide.
Go to GuideTo kick off the New Year, we wanted to re-visit our most popular episode of 2024! In this re-visit episode of Alter Everything, we talked to Wayne McClure, a senior solutions architect at Nielsen, about using Alteryx strategically within a tech stack. We discussed the comprehensive benefits of Alteryx, best practices for complex deployments, and integrating Alteryx with other technologies such as AWS for effective data orchestration. Start off your year right with an Alter Everything classic!
Ep 175 Leveraging Alteryx in Your Data Pipeline – Revisit
[00:00:00] Megan: Hi everyone. We recently launched a short engagement feedback survey for the Alter Everything podcast. Click the link in the episode description wherever you're listening to let us know what you think and help us improve our show. Welcome to Alter Everything, a podcast about data science and analytics culture.
I'm Megan Dibble, and today I'm talking with Wayne McClure, a senior solutions architect at Nielsen, about how to use Alteryx strategically in your tech stack. From using Alteryx as a tool in orchestration to best practices for complex deployments, Wayne shares some great insights. Let's get started. Hey, Wayne, it's great to have you on the podcast today.
Thanks for joining.
[00:00:44] Wayne: Hey, Megan. It's a pleasure to be on here. I'm a, what you could say, a long time listener, first time caller, so. I've been listening to your podcast since I think episode 15, so great to be here.
[00:00:54] Megan: Oh, that's awesome. We love to hear that. Yeah, if you could introduce yourself for our audience.
Give a little background about what your current role is.
[00:01:04] Wayne: Sure. Yeah. So my name is Wayne McClure. I'm a senior solutions architect for Nielsen. Nielsen is a media company that handles the marketing trends of viewership. And I'm on the platform team that makes sure that that information is posed to the right people with the analysis built in.
That's great.
[00:01:24] Megan: I remember when we chatted earlier, you talked about. You're always trying to answer the question, what does the enterprise need to use BI effectively? I thought that was a great way to phrase it.
[00:01:35] Wayne: Yeah. Yeah. I guess if we think about a little bit of my prior journey, I started out in the technology world in the consulting space, just like a backwards way.
A lot of people go in and start off in a large organization and then bridge out to consulting. I joined a company that went into companies, small, medium businesses, looking for The opportunity to engage their data. They had become data aware, but didn't know what to do with it. Understood that analytics and data science thing, but still hadn't maybe graduated past the magic wand that the upper echelons of the CEO suite might see.
So I'm used to looking for how best can analysis serve the enterprise itself and with Nielsen I've had the opportunity to explore just that very deeply.
[00:02:24] Megan: I know you guys are using Alteryx. I'd love to hear from you, in your opinion, what situations are best suited for using Alteryx?
[00:02:33] Wayne: I think one of the main benefits you find with Alteryx is, a lot of times in technology, especially when you think about technology stacks that are deployed, there used to be a use case for saying, get the all in one tool or get the best in breed for each particular stack or the best database, the best ETL tool, the best analytics tool, the best visualization tool.
And I think that still stands very importantly. One of the things I think companies and enterprises hyper fixate is they think of only one opportunity. I think Alteryx opens up a way for analysis, for data delivery, for ETL, even database management to those that maybe aren't database people. So it really opens up the capability of the subject matter experts, the people that understand what that data pertains to, what those measurements mean, what those People and numbers in an Excel spreadsheet are, but maybe don't have the technical fortitude to write a Python script or to set an airflow function or some other form of highly technical capabilities.
Alteryx brings that to them and allows them to quickly iterate.
[00:03:40] Megan: That's awesome. Well said. And I know when we were chatting before, I think you had said Alteryx isn't the end all be all, but neither is software development, so how do you balance those two things and choose what to use when?
[00:03:55] Wayne: I think it's a balance of need.
It's Again, coming from the consulting world and even in a large enterprise, global deployment that we utilize right now, we're in multiple countries, we have multiple iterations of AWS accounts, we have thousands of team members working in analytics and data science, and there's always the concept of how are we going to balance the need for technology and the specialists that can handle that technology for databases.
You really do need a database specialist for cars, software, coding, backend front end UI UX, all that you need those specialists. You need the employment, but you also have to be able to get that data. Quickly to the end users that are going to be able to consume that data, whether they're making business based decisions, whether they're providing the end content to consumer bases, uh, your, your customer base along the lines, it's very easy for organizations to do that narrow wall.
They hit their lane, they get that, uh, that HOV lane on the expressway, and they are just power BI. All the way, Tableau, all the way, Sass, all, you know, and not every problem can be a Tableau nail and hammer, neither can Alteryx be the nail and hammer for everything, but be able to use everything in your toolbook.
It's just like cooking. I love garlic. But I am not going to put garlic in everything. Well, I guess there's an argument to say that you can't, but that's
[00:05:29] Megan: true. No, I'm just kidding. I love garlic.
[00:05:32] Wayne: It is a truth. Yeah, there are, there are reasons to add certain spices. And it's the same thing in a technological world.
There's reasoning behind it. And there's even use cases where you can explore the Deeper data science, heavier programmatic or skilled technical knowledge, and wrap it up within an Alteryx environment so that it's then able to be utilized by non technical people downstreet. And I feel that that's part of probably the beauty of using Alteryx in your tech stack, is you can Marry the non technical business user with the highly technical specialist and bring them together into a synergy of innovation, bringing the product to the market very fast and accurate and repeatable.
That's the beauty.
[00:06:18] Megan: I think some features like creating apps and all tricks are a good example of that. There could be some really complex stuff going on behind an app. There could be pulling from so many source systems. things that data engineers set up, but then you have this app that anyone in the business can run and use cases like that are super powerful.
And I like what you said about using Alteryx as a tool in your tech stack. It doesn't have to be on either or, but it can be a combination of things. So appreciate that answer.
[00:06:53] Wayne: I think we're in that phase in the maturity of the BI space that. The solutions that are out there allow for you to blend palettes.
Borrowing from that painter, there are these happy accidents of marriage of convenience. We deploy a lot of our Alteryx in AWS. We use a lot of cloud structure. to handle our servers and as well as the communication across different lines of business and being able to use Outlook to be actively accessed through other tools using APIs or AWS Lambda functions to communicate and send the data or to trigger workflows beyond the standard I needed to run Monday through Friday at 6 a. m.
[00:07:35] Megan: Yeah, and that's a great segue into what I wanted to talk about next, which is orchestration and how you and your team are using Alteryx while still working outside the Alteryx box, like using it as a tool in our orchestration. I'd love to hear more about that.
[00:07:51] Wayne: Sure. So we have a fairly large user base of Alteryx developers and we have 2000 something workflows running in production a day, but we have.
Active development beyond that. And when you scale to that size, where you have a non production test environment and a production locked in environment and the need for data to come across, you know, 2000 workflows in a single day, you can't just have them. You can't stack them like cordwood. They come at different paces and different spaces.
You can't schedule workflow necessarily at 9am when the data delivery is supposed to be between 7 to 11, but you need that data as soon as that, that data is ingested into its source system. So we use AWS and our cloud architecture and tools to expand upon that and rich upon the platform. The use case we talked about previously, Megan was, as we have a large media data lake, it's a big Apache Spark.
database system that we have that houses literal petabytes of information. In general, it ingests between three to six petabytes of data a week. And to be able to pull that data when it's delivered, it requires dancing. And our dancing is using orchestration like SNS notifications. So simple notification systems within AWS that.
It lets us know when data's been loaded into the MDL. It goes all the way down to the schema and table. So we know when a table is loaded the instant it is completed. And then that will trigger software in AWS. Use primarily Lambda functions that will orchestrate the starting of Alteryx workflows, the sending of messages to other systems, the triggering of stored functions within our Postgres environment.
And so everything is like when you see the nerve strike kind of visualization in movies whenever the hero gets a hit to the chest and all those all those nerve endings spread out and branch out from the source of the hit. And that's what we're looking at is we want the single source that just kicks off an immensity, not just a singlety, right?
Because we, we have an amenity of data that has to be moved. And so to move it in a way that is cohesive across everything is we have the first Rube Goldberg ball drop that starts the whole kaleidoscope of effort.
[00:10:13] Megan: Six petabytes of data a week. Not sure I can really conceptualize that. But. Pretty complex solution is needed for that kind of volume of data, I'm sure.
[00:10:25] Wayne: Yeah, it does. And the analytics and data science space, we cover a fraction. Obviously, that data is used for the organization as a whole. And we're touching into that. But when, just like anything else, when you're feeding that lane, that lane has to be open and available. And it's a concept of scale. With Alteryx or with any tool in general, it's very easy for us to say, here's a designer license.
Go build something. Here's a server we've stood up. Schedule it. Create macros to allow end users to access it from a web UI. Where do we go from there? How do we make that And crank the volume to 11, as they say, right? How, how do we take it to the next level? And you have to have teams and people within your organization that are willing to keep their head on a swivel and look for those opportunities to continue to expand functionality, not necessarily make that same tool bigger.
You know, we have our production server environment is a highly available cluster of nine machines. That's how we scale out tricks. True. But how do we get. The orchestration scaled at that same thing. We can't just have everything on a single scheduler or on a single time basis process. So that's how we looked for those solutions outside the box.
[00:11:44] Megan: That's awesome. It would be great to go ahead and talk a little bit about best practices from your experience and what steps you take for a complex deployment of Alteryx.
[00:11:57] Wayne: So I'm really, really like last podcast that you guys have with Mara and the Amy organization, because she talks about a similar concept in the AI space.
And it boils down to intent and understanding your intent. We so commonly build solutions for the problem in our head, right? We have six Excel spreadsheets that need to be ingested. We have emails that get sent out or a database that gets loaded and it has to be aggregated and brought. And that's, that's the problem, and we're sourcing the solution.
But we also need to know why is that data coming? Who is that data coming from? And look further upstream. Like that Excel spreadsheet that ends up in a And a SharePoint folder every Monday at 9 a. m. Look beyond that. So often organizations do, I call it the bucket brigade. This team handles data A, blends with data B, hands it down to the next team.
And they take a little bit of this data and a little bit of that. And it just moves down the assembly line. And at the end of the day, the true knowledge is lost. You ask the end user or the person in the middle of that chain. Why are you getting that data? Where are you getting it from? And so often that's lost.
I don't know. I'm just doing things by rote. And having the intent then, of understanding not just what you're working with and the solutions you're trying to solve, but also understand where that data is coming from. How it's coming in. Why it's coming in in that process and that order. Really gives a, gives an organization a better opportunity to build a better mousetrap.
We're, we're all solving problems. We're all developing answers to questions. But we're looking at the minutiae when we should probably be looking at a larger space first. So having, best practice is having somebody looking at the big picture. If I take that and spin it larger, Megan. But, you know, look at the outside.
Look at everything that's coming in. Have a big picture guy or a big picture girl or a big picture team, depending on what your organization needs, that is able to Ask those questions, and even if they're the uncomfortable ones, why are we doing this? Well, we've done it for 10 years that way. But why are we doing it that way for 10 years?
Let's look at what we can change. And if it doesn't need to be changed, that's okay. That's an acceptable answer too. We'll keep it as is, is an okay answer. We don't always have to rewrite the rulebook. Asking those questions and then looking for the solutions that aren't necessarily the ones you've already done.
And I think that helps drive progress.
[00:14:41] Megan: Definitely. I agree. I think having a big picture person or team is super important. And it sounds like knowing the big picture of your data pipeline is important. Knowing the steps that the data took and not just having that situation like you talked about where each person only knows a small piece.
And so I'd love to hear from you about how you could use Alteryx as your data pipeline.
[00:15:09] Wayne: Sure. One of the beauties with Alteryx is the fact that it is a low code solution that you can easily schedule. So that is inherently a data pipeline, right? You can connect to a data source, you can connect to multiple data sources.
And aggregating an organized data in a way that's valuable to yourself and put it somewhere. That's, that's ETL by its definition, right? And taking Alteryx as that intermediary component of taking the, the raw data as far upstream as you can possibly get to it. And doing those aggregations. And the nice thing about Alteryx is it allows you to take these seemingly unknown stuff.
I think it was at the Alteryx Inspire Conference in 2021, I spoke and we talked about Using Alteryx to rapidly iterate against old processes. We come into a process where the SQL is lost. The tool had been retired five years ago. It was just a black box humming in the background, you know, and
[00:16:08] Megan: I think that's pretty common for, I bet there's some listeners who can relate to that problem.
[00:16:13] Wayne: Yeah. Documents are lost. So Alteryx allows you to take it and. Refactor that into a process that now has documentation that now has a UI visualization of how the data is orchestrated. And you don't have to do that beginning select all and bring in 8 million rows and then filter it out to the thousand output Excel spreadsheet that gets delivered to an email.
Once you've done that first one, go to that version 2. 0 and take those initial data tables and those initial SQL queries. Code that into your SQL query, so that now your Alteryx job doesn't take 33 minutes, but it takes 13 seconds. Because you've done all the hard work using Alteryx to solve the question first, then you place it in its appropriate spot.
SQL doesn't need to be handled in a filter tool, right? If you're filtering right after your input data tool, That should be part of your SQL query. But it's an acceptable practice to use Alteryx to bring it in natively. Explore and know exactly the intent of what you want your end output. And then adjust it to fit the entirety of the role.
[00:17:23] Megan: Mm hmm. That's great advice. And when you talk about, you mentioned sending the data to the appropriate spot, what do you mean by that? You mentioned an Excel worksheet. Do you write it back to databases? What would the appropriate spot look like?
[00:17:40] Wayne: I think for any organization from two man analytics teams in a small business of 20 people to large organizations with thousands of employees, that answer is as varied as there are stars in the sky.
But in general, we take a look at, we take a look at most use cases is data that needs to be evaluated or acted upon in the BI space, right? We use this to move the business forward. We use the questions that are asked. Um, prior to the data process being written to be answered on a relatively normal cadence, Mondays, Tuesdays, every day at four o'clock, whenever that case may be.
And then that is then used for the intelligencing purpose, right? This is why we'll never get rid of people at the end. There's, there's gotta be somebody who presses that switch or two. Guide the ship based on the results and how that data is delivered really doesn't matter. It can be an Excel spreadsheet.
It could be a database. It could be a Tableau hyper file. That's a dashboard that somebody views. So that's also the concept is you probably going to have a little bit of everything there. It depends on what the use case is used for. If we're sending data to HR, Human Resources, and they're going to have sensitive PII data, we probably don't want to toss it into a shared folder somewhere.
We also don't want to have it emailed across somewhere. So making sure you tailor it to the need of the business and the need of the end user specifically. So that goes back to that concept I talked about, knowing with intimacy. Your project's scope from where the data starts to who it needs to be given to or who is making the final decision.
That end user is a very integral part of SoundBI as far as I'm concerned.
[00:19:28] Megan: Totally. I'd love to wrap up with hearing about what makes you excited for the future.
[00:19:34] Wayne: Sure. Yeah. Well, I mean, there's a lot of really fun stuff that has been debuted within Alteryx recently. There's a lot of fun stuff just in the tech world in general.
We talked about last week's podcast or the several weeks back podcast with Mara on the AME technology and, um, the integration of artificial intelligence is an interesting point. And if people are starting to wrap their minds around how that can be applied in, in analytics and in Alteryx. Yeah. I look at how can we use then the concept of large language models or artificial intelligence to help streamline the things that our analysts, that our data scientists don't want to do.
If you ask anybody what they don't want to do, it's the documentation. That's why we have those black boxes humming in the back corner for a decade with no documentation anymore. Because nobody wrote it or they don't want to write it. And Alteryx debuted the tool, the plugin that allows for chat GPT. To do an analysis of the job of the Alteryx workflow and then to create commenting and documentation doing annotations of the Alteryx workflow itself.
And so we're looking to take that, much like I've talked previously, crank that to 11. We use a documentation platform, Jira and Confluence to allow for documentation across all of our processes. Somebody can go to a Confluence board, search for a Lambda function or a job name in Alteryx and find that job.
A little bit of information about it, who developed it, what it does, how it works. With the kind of integration that ChatGBT has, we can build an, a Alteryx workflow that just goes in and catalogs every workflow on our server. Creates a, a metadata output that can be loaded as the actual Confluence page.
With hyperlinks going to the actual job location in its studio, with the source of the data sources and tables, just summary events. So that, that documentation is now correct, organized the way that's uniform across all users, because it's not 200 people writing something they don't want to write. So depending on how, you know, adventurous they are, if they do the job well or not, it's uniform and it's accessible.
And best of all, nobody does anything except for the scheduler.
[00:21:56] Megan: Yeah, that's great. Thanks for sharing a little bit about that use case. I'm excited too. See where that goes for you guys. And I think that workflow summary tool is super exciting and automating the boring parts of being a data analyst or in data science, machine learning, automating out those boring parts really is exciting for me personally, and I think it will open up.
opportunities for people in the field to just continue to tackle the most interesting, the most challenging problems.
[00:22:28] Wayne: I agree. Yeah, it's definitely an area we're very excited upon.
[00:22:32] Megan: Great! Well, it's been really nice to have you on the show today. Thanks so much for sharing your knowledge with us about best practices, about data pipelines, about the kind of cutting edge things you guys are doing with Alteryx.
[00:22:45] Wayne: It's a pleasure to be on. Thank you, Manu, for your time today.
[00:22:48] Megan: Thanks for listening. To check out topics mentioned in this episode, head over to our show notes on community. alltricks. com slash podcast. See you next time.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.