Alter Everything Podcast

A podcast about data science and analytics culture.
Alter Everything Podcast Survey

Be entered to win a prize while sharing your thoughts about our podcast.

Give your two cents here
Don't forget to submit your entry for the Excellence Awards by October 30! | Need more information about the program? Check out the blog here
Alteryx Community Team
Alteryx Community Team

Alteryx workflows are already fast, but the new architecture behind the Alteryx Engine, Alteryx Multi-threaded Processing (AMP), takes data processing performance to a whole new level. Sr. Manager of Community Content, Neil Ryan, sat down with Principle Engineer, Adam Riley, for a peek behind the curtain of the AMP development process, and a chat about “the why” behind this massive new release that was years in the making.

 

 


Panelists

Neil Ryan - @NeilR , LinkedIn
Adam Riley@AdamR, LinkedIn, Twitter

 


Topics

 

 

Alteryx legends discussing a legendary engineAlteryx legends discussing a legendary engine

 


Transcript

Spoiler

NEIL: 00:00

[music] Welcome to Alter Everything, a podcast about data science and analytics culture. I'm Neil Ryan, and I'll be your host. I sat down with an Alteryx legend, Adam Riley.

ADAM: 00:13

I'm Adam Riley. I'm a principal software engineer here at Alteryx, and my focus in that time has been working on the Alteryx engine.

NEIL: 00:23

We had a chat about Moore's Law and Amp, the next generation Alteryx engine. Plus, Adam shares some behind-the-scenes tidbits from his nine-plus years at Alteryx. Let's get started. Cool. Now, I think that you're somewhat of a legend in the Alteryx community because of the CReW macro pack. Can you talk a little bit about that?

ADAM: 00:53

I can yes, Yeah. I think CReW macro pack is what I seem to be famous for [laughter]. At the conferences, people come up to me, "Oh, you're the CReW macro guy. Oh, so glad to meet you." I'm like, "Yeah, yeah. And all that work I did on the engine, as well." But it's the CReW macro is what makes people excited.

NEIL: 01:10

The CReW macro pack is a free set of macros that you can install alongside the tools that come out of the box with Alteryx. They're incredibly popular among Alteryx users, so understandably this is Adam's claim to fame.

ADAM: 01:22

And so, I guess, the CReW macros came about I think through the Inspire Conferences, really. So, in fact, I think the very first CReW macro was written 10 years ago at the first Inspire I attended, in the solutions centre. So me and a few Alteryx employees, I think it was over dinner we were talking about an idea and we're like, "Well, let's go to the solutions centre and see if we can build an Alteryx macro." And we did and--

NEIL: 01:51

Incidentally, I think this is the second episode where we've talked about an Alteryx tool being built on the spot in the solutions centre at our annual Inspire Conference.

ADAM: 02:00

So I think I built a few of these macros, so over time I found them useful and I thought maybe the other people would. And so started the CReW macros as a way of sharing those out with a wider community and trying to get community involvement with building them. So yeah. And they're very widely used today, and I know people enjoy them a lot, and--

NEIL: 02:22

Very cool. Do you remember what the very first macro you wrote was for the CReW macro pack?

ADAM: 02:28

Yes. The very first one was the Google Street View macro, which I think is still in there today? So I mean, basically use the Google API's to take a lat-long. It took those coordinates, dropped them into a WebCore, called out the Google Street View API, and captured the image of what the street looked like at that particular position. And then brought it back as report-snippet. So then you can imagine you were doing some sort of record for potential store locations or existing stores. And you'd get a report showing you the street view of what it looked like in that neighbourhood, for each of those locations.

NEIL: 03:06

Very cool. And now it's grown to what? A few dozen macros that are all free to download and can add-on to the base Alteryx tools?

ADAM: 03:14

Yeah. I think maybe about 24 macros today. Yeah. And yeah, free to download. Available at chaosreignswithin.com.

NEIL: 03:24

If you didn't catch that, you can find the CReW macro pack at chaosreignswithin.com. It might sound like a random phrase, but it was actually an error message that would pop up from time to time in the earliest days of the Alteryx software.

ADAM: 03:39

So I'm no longer the maintainer. I've passed that across to another legend in the Alteryx community, Mark Frisch, or Marquee Crew as he is known in the community. So yeah, I'd found that my Alteryx life was taking up a lot more time and I didn't have the time to give. So I passed it across to the community, and yeah, Mark is running with those now and they're still going strong.

NEIL: 04:02

Very cool. So in this previous life as an Alteryx user, before you joined Alteryx, were you a software developer back then?

ADAM: 04:13

No. So I started my career, I was a data analyst, so I did a mathematics degree at university. After university, I was like, "What do I do now?" Fell into accounting for about six months and realised I absolutely hated that. Yes, so then saw this job advert to be a data analyst for Experian in Nottingham. And in actual fact, so the job title was data analyst and when I applied for it, I think I imagined I'd be doing more sort of statistical-based things. So sort of the stats stuff, we're doing R and Python today and SAS which was the main product they used then. But when I actually got to the job, it was more, a lot of our users do today of getting data into the right format, sort of manipulating data from one thing to the other. So I'd always programmed as a hobby, so we did a lot of Visual Basic scripts at Experian and I dabbled with C# a bit so I'd done some programming but that wasn't my main job. And I had written an Alteryx scheduler at Alteryx. This was before the server product that we have today exists that lets you schedule workflows.

ADAM: 05:25

So I'd use the Alteryx APIs and C# to write an Alteryx scheduler. So you could schedule an Alteryx workflow and it'd get run out on Experian's big servers that they had or the big data sets. But yeah. So formal software development, I didn't have. So yeah. I wanted to get into that area and my employers at Experian knew I was trying to get into that field. So I was trying to make them move sort of internal to Experian but the difficulty with software developers, applying for a job, they always want experience in software development and I didn't have a lot of that. So I'd applied for a few jobs externally as well to Experian and, again, got turned down because, well, I hadn't actually got any experience as a software developer. This is a bit sort of chicken and egg type thing. So I didn't actually apply for a software engineer job initially at Alteryx. Yeah. I think it might have been a job on the data team so building the data sets. And I said, "Well, yes, I'm going to go for that. I want to work for Alteryx. I want to be part of it." So I sent off an email sort of applying for it. Straight away, get an email back from Ned, the CTO at the time, "Call me." And I was like, "Whoa. That wasn't what I was expecting [laughter]."

NEIL: 06:47

That's cool. The Ned that Adam is referring to is Ned Harding, co-founder of Alteryx and the architect of the Alteryx engine.

ADAM: 06:56

It was kind of cool but I wasn't quite ready right now to have a conversation with a CTO. I was like, "Okay. Well, yeah." I was like, "I'm serious about this so I better give him a call." Yeah. So I phoned him up and Ned was like, "So data team, is that really what you want to do? I can see you doing some different roles at Alteryx." And I think in his mind, he was thinking I could be a sales engineer but he just sort of asked the question, "Yeah, what do you want to do?" I was like, "Well, if you're asking, I want to be a software engineer and I want to work on engine." He was like, "Okay." He was like, "So have you got any experience in C++?" I was like, "Well, no, no." And yeah. So I mean, he told me afterwards that he figured he'd give me six months and if I didn't work out, they'd be able to find me a job as a sales engineer.

NEIL: 07:43

Okay. That's cool.

ADAM: 07:45

But obviously, nine years later, I'm a principal software engineer still working on the engine so I think I can say it worked out okay.

NEIL: 07:52

Very nice. So day one, you're in what? C++ boot camp or something like that?

ADAM: 07:57

No, pretty much self taught on the job actually. The advantage I had was because I knew the product incredibly well, then if I wanted to make a change to it or I wanted to add something to it I was like, "Well, what I want to do here is very much like how this tool works over here. So I'll go and look at that tool and see how they do it there and then I can learn by looking at how it was done by everyone else, effectively." So yeah. I learned C++ on the live or text code base. That's my confession.

NEIL: 08:31

Cool. Well, yay. So you joined before the original server product was released as they were building it, I guess.

ADAM: 08:41

Yes. Yes. They were building that out.

NEIL: 08:43

That makes a pretty good segue here because just this week as we were recording kind of a new next-generation server product was just released, Alteryx Analytics Hub. But what we really wanted to get talking about today was AMP. So let's get into that. What is AMP? [music]

ADAM: 09:12

Yeah. So AMP is the project I've been working on for the last four years now. So perhaps will have heard of it under its code name, which was E2. But effectively, AMP is the next-generation Alteryx Engine. So Alteryx Engine re-imagined the modern technology, modern hardware and designed to massively scale across multiple course. [music]

NEIL: 09:41

So I know this project has been years in the making and you've mentioned Inspire Conferences before, where at least in the past, pretty much all of Alteryx goes and gets to meet Alteryx users in person. And I've been to over five of those so far, and I've never actually met a user that complained that the Alteryx Engine was slow. So can you just give kind of a little background of why-- how did the idea behind E2 take shape? Why did we need E2? Who came up with the idea?

ADAM: 10:25

Yeah. No, it's a great question. And yeah. And a very good point how you say, when you're in the solutions centre at Inspire's, people don't come and say, "That engine's too slow." It's not criticism we get [laughter]. I mean, many time, it's the opposite of, "Wow. This is great. This engine's so fast." Yeah. So this came about-- I mean, so Ned, the sort of former CTO was the sort of original brains behind the start of this project. But I guess to explain it fuller, we need to go back and talk about Moore's law. [music]

ADAM: 11:02

So Moore was an early pioneer in Computer Science history. So he worked back in the '60s, I think. '60s, '70s. And Moore's law-- and it's called law but it's actually more of an observation. And what he noticed was as time went on, the number of transistors on your CPU was roughly doubling every year. He later revised that to every two years. But he noticed this, as I say, early on, back in '65 that he first observed this. And as the decades went on, this continued to hold true. And why that's important to you as a computer user, and me as a software developer is the number of transistors on a computer chip is very much-- so the transistor is what does the actual calculations, when it gets down to the zeros and ones right at the [bottom?]. I mean, it's [crosstalk]--

NEIL: 12:07

Adam just mentioned transistors, but he's about to go even deeper with threads, cores and other stuff that's important to know about in order to understand how the Alteryx engine works.

ADAM: 12:17

Zeros, ones right at the [bottom?], it's the transistor's doing. And so the more transistors you've got, the more calculations you can do in a given time period. And the faster your software runs. Well, that's the theory. So Moore's law carried on nicely through '70s, '80s, '90s. Sort of up to about 2005. And as I say, it was a law and some people would argue, as a sort of self-fulfilling prophecy. Because Moore set down this law that the hardware manufacturers almost took as a target of, "Okay. Well, in two years time we need to have twice as many transistors."

NEIL: 12:55

Maddy, our producer, says it's time for an analogy. So inspired by the internet as a series of tubes, and if you don't know what I'm talking about, just Google, "The internet is a series of tubes." My analogy will also use tubes or pipes. So as water usage in houses goes up over the years, they need to make the pipes bigger. And to be clear, I'm making this up. But in my head, if you want more people in your house to be able to shower at the same time with solid water pressure, you need a bigger pipe coming into your house. And in the year 2000, if you wanted your computer to run faster, you'd need more transistors on your chip. Back to you, Adam.

ADAM: 13:34

But yeah. About 2005, hardware manufacturers began to run into a problem. And they were beginning to sort of hit some of the actual physical characters of just how many transistors you could physically fit onto one of those chips before you were getting problems with heat capacity and things. And so the answer, because we've got Moore's law, we've got to hit these targets to get the transistors on, was to go multi-core. So the number of transistors continues to double. But what you'll notice around 2005 is that the clock speed, so that's actually how fast the chip runs for each of those cores, levels off. Because the chips can't run any faster than that.

NEIL: 14:16

So that was, sorry to cut in, that was around 2005 that you're saying they started going multi-core?

ADAM: 14:24

Yes. Which, coincidentally, was exactly when Ned sat down to write the first line of the original Alteryx engine.

NEIL: 14:32

Okay. Yeah. That's what I was-- in what year did you join Alteryx?

ADAM: 14:35

I joined in 2011.

NEIL: 14:38

Okay. All right, keep going.

ADAM: 14:41

Yeah. So because Ned started writing in 2005, he wasn't writing for-- well, so the chip certainly existed then. Most computers, most used computers had a single chip. If you were lucky you had a dual-core and maybe your server was quad-core.

NEIL: 15:01

So over time, if you have a couple kids you just move into a bigger house with a bigger pipe. And still everyone can shower in their bathrooms without sacrificing water pressure. But at some point, the pipe gets to the point where if it's any bigger it can't fit onto the semi-truck so you can't even get it to the construction site. What do you do? Add a second pipe. A second core.

ADAM: 15:22

So the original Alteryx engine was designed to work on those type of CPUs. The big limitation of the old engine, or the current engine I guess I should say, was that the main data pump, so what's actually pushing those records through the system as the workflow is running, runs in the main thread. So that's sort of the main program logic in a computer. So what that means is it can't take advantage of these multiple cores. But that's fine because many of the tools in the existing engine are able to run background threads. So, for example, the input and output tool, they have background threads that will pull the data from disk, and another background thread will push the data back to disk at the end. And in actual fact, in many workflows, the disk throughput is the sort of limiting factor anyway. So that works fine when you've got two or four cores because you've got the main thread using one of the cores pushing the data through. You've got these background threads taking data to and from disk. And then things like the sort tool, the join tool, they were multi-threaded in the old engine, so they were able to use more of the cores. And it was all good and, as you say, we've not had complaints about the engine's speed. It was a fantastic design. It's served us very well up to this point.

ADAM: 16:48

But wind forward to 2016 when we started thinking about this E2 project, and you're beginning to look at CPUs that are coming out with 16 cores, 32 cores, 64 cores. And the existing engine just can't make use of all of that CPU power that's given to it. So the problem when you've got this vast number of cores is that the existing engine just can't make use of all that CPU power that's given to it. So it means if you want a workflow to run faster, just buying a bigger chip doesn't help you. And in some cases, it actually runs slower on these bigger chips because if you've got 64 cores, the individual clock speed on each core tends to be a little bit slower. And so we've seen cases where you put an engine onto this big beefy server that's got 64 cores and it doesn't actually run any faster than running a laptop.

NEIL: 17:49

So to extend this tortured metaphor a little further. At first adding more pipes to your house works fine because your plumber saw this coming and equipped the house to be able to handle four incoming pipes, just like the sort and join tool can make use of multiple cores. But now in quarantine everyone is at home at the same time, in every house, all through the neighbourhood, all showering at the same time. It's going to take some work to redo the plumbing on all these houses to make use of all the pipes. And that's the big ass plumbing job that Adam and the rest of the Alteryx software developers just finished. Now everyone can shower whenever they want and everyone smells really nice.

ADAM: 18:29

So the goal with E2 was to set out and-- sorry, the AMP engine, to set out and re-imagine how that architecture would look and how we could make use of machines that had 64 cores, and move in the direction that the hardware's going. So in the future when you get even more cores, we're able to use those cores and make the workflows run faster still. So that was the key goal.

NEIL: 18:55

Very cool. So this is back in 2016. I guess the idea originated with Ned. Where did you come in on the project?

ADAM: 19:10

Yeah. So I guess that was a funny story. So I joined the company in 2011 and I moved out to Boulder, Colorado where the tech headquarters was then. And then two and a half years later, we were going to have a baby so we moved back to the UK to be near family. And I've been working remotely in the UK since then. So I think Ned had been talking about this idea of a new engine sort of on the back half of 2015. And it was decided start of 2016 we were going to set up a small team, and we were going to build out a proof of concept and see whether these ideas would work, and whether they'd actually help to make things go faster. So it must have been near the end of 2015, Ned came to me and was like, "Oh, so would you like to work on the new E2 engine?" as it was called then. And I have to say my initial reaction was like, "Actually I don't think I do." So in hindsight, things could have gone very differently for my career, but. And my thing at the time was, "Well, I'm enjoying working on the current engine. I like creating features that users can see." And also being remote, I was like, "Well, if I disappear into this research project for two years, are people going to forget who I am?" I'll re-emerge and people are like, "What were you doing for--?" But I didn't get my way, and I ended up on the research project and it turned out to be a very good thing, so.

NEIL: 20:46

Did he not take no for an answer? Or did you change your mind?

ADAM: 20:49

He didn't' take no for an answer [laughter]. Yeah. So 2016 we started. There was three of us. It was me, Ned, and Scott, a developer who worked with us then. 1st January 2016 we sat down, wrote the first lines of code, of what was to become the AMP Engine. So I guess once we'd sort of proved out that first proof of concept, and decided this was going to work as a thing, it was then time to bring in the production teams and sort of get more people involved in it. And over the course of the project, we've had a large number of developers working on it, across multiple continents in fact. So during the timescale of the period, we actually opened an office in Kiev, Ukraine. And there's today two development teams out in Kiev who work on the AMP Engine, along with teams back in our tech centre in Broomfield, Colorado. So yeah, now a large number of software developers who have all contributed to the journey and got us to where we are today. So yeah. Thanks, everyone, for being part of it.

NEIL: 22:00

Good team effort.

ADAM: 22:02

So yeah. I guess the first six months to a year was proving out the concepts. And the initial benchmark that we took was the join tool, and the goal was just, "Can we make the join tool run faster in a new engine than it does in [inaudible] one?" Because I guess the difficult thing for software engineers and [Mosthetic?] systems is writing [Mosthetic?] code is extremely hard because it's not for free. So you've got all these multiple cores, but to be able to utilise them there's a cost. So every time you spin something off in new thread and put it onto a different core, there's some CPU cost that's involved in doing that. And then you've got to keep track of when the results come back because so many things in a [Mosthetic?] world don't necessarily run in sequence. And I think they'd done previous experiments at Alteryx of trying to multi-thread more of the tools. So I think there's a story of somebody looked at the formula tool and said, "Well, every time a record comes through the formula tool, why don't we just process that record in a new thread, and we can make use of all these CPUs. Life'll be great. It'll all go faster." So they did that. And great, the CPU lit up 100%. We were using all those cores. Fantastic. But the actual workflow speed, the clock speed was slower than doing it single-threaded. And the reason for that is an Alteryx formula for a computer, so calculating a new field like a multiplication, that's easy for a computer. It can do that in no effort at all. And so the amount of work got swamped by the cost of putting it into a new thread. And so although you used all the CPU, it was just wasted effort and you didn't actually get any faster.

NEIL: 23:56

That's interesting because you'd think, yeah, why didn't you just do this before? Well, we tried and it didn't work. Yeah.

ADAM: 24:05

Right. Exactly, yeah. And so AMP actually introduced this concept of record packets. So we're no longer processing data record by record, but we deal with records in what's called a record packet. So that's a fixed size amount of memory. So let's say its four megabytes. And you get four megabytes worth of records, and they will get scheduled on different calls. So then you're only paying the cost of multi-threading per four megabyte junk of records rather than per record, and that gets you those benefits.

NEIL: 24:40

Cool. So it's a compromise. You said before it's not free. Nothing's free when you're multi-threading so that's the compromise, you packet them.

ADAM: 24:51

Yeah.

NEIL: 24:52

Very cool. Yeah. I know there's a no free lunch theorem in the data science world seems like [crosstalk]--

NEIL: 24:57

We've got a good blog about no free lunch that we'll link to in the show notes.

ADAM: 25:01

Well, and what was interesting with the software engineers is-- so there's a phrase from a famous member of the C++ community of, "The free lunch is over." And what he meant by that was in that early days of Moore's law before we got this run off on the clock speed, there was a free lunch, because every two years, computers got faster. So if you wanted your code to run faster, you could just sit back and wait a couple of years, and the hardware manufacturers would do all the work for you. So yeah. It was a good time to be a software engineer. But yeah. The free lunch is over.

NEIL: 25:36

So you mentioned the first-- so you started early 2016 with the first lines of code on this new engine E2 and you were working towards the first benchmark for the join tool. So how did that go? How did that first benchmark go? Was that [crosstalk]--?

ADAM: 25:54

Yeah. So I think it must've been maybe six or nine months in, we got the join tool working. So, of course, for the join tool it isn't just the join tool. You've got to have some data to pump into it. So we had a CSE input tool at that point as well. So the CSE input tool, a join tool, and I guess we had an output tool so we could actually see the results coming out and check we were getting the correct answers at the end. So those were the only three tools we had in the engine at that point. Yeah. And we got the benchmark running and I think at that point we were, I want to say, 20 times faster than the existing engine. So don't get too excited because that was an early speed up. But yeah. But enough to prove that we had something real there, and this was something that we wanted to pursue. So this was on a big machine with, I think 64-core machine that one was. And yeah. The speed up has come down since then because we had a very lightweight CSV reader. We didn't have the full functionality. But yeah. We were getting a multiplier speed up. But the best bit was that what we had, and what we have today, is if you add more cores then you can increase the speed of it more, dependent on your workflow, so. Yes. Caveat to all speed claims about software, testing your own workflows and, yeah, your own hardware.

NEIL: 27:24

That's really cool. You're getting to the part of the story that I remember bits and pieces of. I think back then in 2016, we'd do big releases just once or twice a year. And so after each release then the whole product and development world would get together, and do a product kick-off to talk about what's coming in the next release. And I remember Ned getting up and demoing that join tool, and just showing how it was so much faster than E1 and there was amazing excitement in the room. I think it was one of the coolest Alteryx meetings I've been a part of. [music]

ADAM: 28:09

Yeah, no. Exciting days, it definitely was.

NEIL: 28:15

So how was it working so closely with Ned on that? I imagine, being you were one of the first dozen or so engineers, you've worked closely with him for years, but for that project just the three of you.

ADAM: 28:32

Yeah, no. It was fantastic. I mean, yeah. 2016 was a great year. It was just, head down, coding. Every day we were writing C++, solving interesting problems. Yeah. I mean, a huge mentor to me and taught me so much about programming and C++. So no, it was really good.

NEIL: 28:57

And you got up on stage with them at Inspire, right? Was that 2017?

ADAM: 29:02

2017. So yeah. So midway through 2016, we had this benchmark. I think right at the end of 2016, we had it integrated with Designer so you could actually press play and design and run on the new engine. But still, we had very few tools at that point. We probably added maybe a few more of them. Maybe there was 10 tools by the end of 2016. So then the challenge we had was how do we get-- 10 tools does not make Alteryx as we all know [laughter]. So how do we get the rest of those tools and what's the next steps to get us through to release? Obviously, it's taken us-- well, from that point, it took another three years to get to the point we could release it out to our customers. And yeah. Another challenge was because there was a complete re-architecture of how the engine worked, it meant each of the individual tools needed to be rewritten effectively from scratch. I mean, obviously, we reused all the code and logic we had before, but to work in this new [methodic?] way. So--

NEIL: 30:06

Yeah. I guess you're separating multi-threaded code is hard, so I guess there's just no easy way to redo everything.

ADAM: 30:13

Yeah, exactly. So yeah. I think I remember Ned saying I think it was probably around midway through 2017. He was like, "If we were a start-up, we would be releasing this by now, but we're not." We've got customers who use our product and we've got to support backwards compatibility with the workflows out there. And we didn't get the 100% compatibility because the engine works fundamentally in a different way. So you can't take all E1 workflows and run them in E2 without any changes, but we got a long way on the way there. Yeah. So 2017, I think was the first time we announced it sort of publicly to our users that were working on this new engine. So yeah. I mean, one of my favourite memories was the end of the UK Inspire that year, sort of probably the back half of 2017. And Ned closed the conference with a talk on the E2 engine and I got to get up on the stage afterward with him. It was the main stage with the full conference and take questions and answers which was, yeah, a great experience.

NEIL: 31:21

That's awesome. Yeah, I remember. I think you and Ned were doing a question and answer session. I think this is a different one but, it might have been in the US, just of how the engine works and those Inspire sessions are always jammed packed. I was in the back standing because there weren't any seats left.

ADAM: 31:46

Yeah. I always enjoy those sessions.

NEIL: 31:48

Well, let me ask you this and you touched on this a little bit just in terms of you had 10 tools at one point, and 10 tools does not make the Alteryx tool palette. Can you just speak a little bit more about-- Alteryx at this point has been around for a while, Alteryx Designer. I don't think it was called Designer at first, but more than a decade at this point. So with the change this big of fundamentally new data processing engine for a data analytics platform, how do you ease that in without kind of disrupting things for the majority of current users?

ADAM: 32:40

Yeah. And that was one of our biggest challenges really was that compatibility with all those existing workflows that are out in the world today. We don't want our users to have to go out and rebuild everything from scratch. That's absolutely not an ideal situation. And so there's a couple of things we've done. So the first thing was, how could we get this product in the hands of our users without having to rebuild-- so as I said, end of 2016 we had 10 tools. I think in total now I want to say it's 250 tools? I don't know. There's a lot of tools.

NEIL: 33:16

Yeah. Over 250. Yeah.

ADAM: 33:18

Yeah. So we knew we weren't going to have--

NEIL: 33:21

Analytic building blocks, sorry [laughter].

ADAM: 33:24

Analytic building blocks, indeed. So we weren't going to have time to rebuild all of those tools before we could launch it. So what the new engine actually does-- so I think we're up to 50 or 60 converted tools now. So we started with the most popular tools, the most widely used tools. But obviously, there's this big long tail of tools that some people find absolutely essential to their daily work. So it's not like we can remove those from the products but some people would probably never have heard of. So arrange tools, one, make columns. Yeah. So these are down the long end of the tail of usage. So what the new engine actually does is, although we haven't implemented all of the tools in the new engine today, if a tool hasn't been implemented in using the workflow that's said to run under the new engine, then for that particular tool that hasn't been converted, it actually spins up a copy of the old engine behind the scenes and uses the old engine to do the data processing for that particular tool, or if it's a group of tools, for that group of tools. [music] This doesn't come for free. So there's a cost because the data formats are different between E1 and 2. So there's a conversion cost, going back and forth but the challenge was to get to a point where we got enough of the tools converted that, hopefully, for the majority of workflows, the cost that you pay going back to E1 gets outweighed by the benefits of going multi-threaded in E2.

ADAM: 34:59

And obviously with time, we'll continue with each new release to convert more of those tools, and gradually sort of phase those tools across. And the other thing to mention, of course, is we've not just released the new engine and said to users, "Here you go. Off you go." So for the first release, it's opt-in on a workflow by workflow basis. So if you don't go and check the checkbox that says, "Use the AMP engine for this workflow," it will still run just as it did on the last release and all your workflows are going to work the same. If you do want to try it then, yeah, go into a workflow, check the checkbox that says, "Use the AMP engine" and run the workflow and then see what results you get. So as I say, not all cases are going to be faster. It's going to be faster generally when you've got bigger data sets. So we talked about the data being paralysed when it's in four megabyte chunks. So if you've got less than four megabytes of data, I mean, the workflow was fast enough before, we don't need to go parallel for that. But likely you're not going to see a speed up. Where you're going to see the speed up is where you've got large gigabytes of data going through, and you've got a machine with multiple cores that can take advantage of it.

NEIL: 36:15

Very cool. And we're getting a little bit into the nitty-gritty of how the new AMP engine works under the hood. Just want to mention that you're going to be writing a series of blogs on the Alteryx Engine Works Blogs, that goes into much more detail, right? I think by the time this episode drops, the series will have started so we'll certainly link to that in the show notes. Well, thanks for joining us, Adam. Truly enlightening. You mentioned earlier in the conversation when you were trying to decide whether or not to work on the AMP engine in the first place four years ago. You were worried if you did you'd just go into a research hall and not be heard of again for a couple of years. Well, I guess you're out now and you're on the podcast so people know you [crosstalk].

ADAM: 37:12

I am now. Happy to be here. Yep. No, thanks for having me. Yeah. It's been a lot of fun.

NEIL: 37:18

All right. Thanks again, Adam. Bye.

ADAM: 37:21

Thanks. Bye.

NEIL: 37:22

Thanks for listening. Check out community.alteryx.com/podcast for links to Adam's blog series about the AMP engine and to join in on the conversation. You can also share on social media using #altereverythingpodcast. Catch you next time. [music]

NEIL: 37:46

One question I had, just in terms of that nitty-gritty. You mentioned how E2, the amp engine is designed from the ground up to work with more modern hardware. And you went into a lot of depth on the multi-core processors. What about solid-state disk? How has it optimised for that, as opposed to the old spinning hard drives?

ADAM: 38:13

Yeah. That's a great question. Yeah. Something I probably should have talked about actually. That was another thing we looked at in terms of changes when we started to re-imagine. Because, honestly, when the original engine was started, solid-state disks didn't exist. It was a spinning head with a physical arm that moved across that spinning head to read the data off it. And what that meant-- and this is a question that actually comes up a lot from the community at Inspires. And the question is, if you've got a workflow that's got two inputs on, why does the existing engine read all of the first input first, and then all of the first input second? So the question is, why doesn't it read both of those at once, because wouldn't that be faster with multi-core machines? And the answer's two-fold. The first answer is, well, at its heart the E1 engine is single-threaded so we really can't. And two, when the engine was first imagined we really didn't want to, because this physical disk with the moving head, it's much more efficient to read one part first and keep the head sweeping across the one file, and then read the second one. Rather than the head trying to jump back and forth between the two files, and read different sections. It would actually go slower.

NEIL: 39:27

Yeah. That makes sense.

ADAM: 39:28

Jump forward to today, the vast majority of people will have solid-state disks. If you don't have a solid-state disk - this my public service announcement - please go out and bug your IT department and tell them you need a solid-state disk. Because it makes a big difference to both the existing engine performance and, certainly the E2, the AMP engine performance. And the difference with the solid-state disks is you're able to randomly access the data on that disk. So no longer have you got this head swinging back and forth, but you can say to the operating system, "I want this section of the file, this section of the file, this section of the file, this section of the file." And the actual CSV reader in Alteryx is very clever because that's exactly what it does. So when you fire up the engine it requests these multiple blocks off the disk all at once. The operating system returns them as soon as it can. And as soon as they come back, the AMP engine is working to stream those. So to limit the data into the Alteryx record format and push them downstream. Which means we can get incredibly fast CSV reads. Yeah. It's just a great feature. And there's some really clever technology. If you start thinking about the actual practicalities of reading a CSV file, when you've just requested some random chunk of the file, there's some very interesting problems that had to be solved for that.


This episode of Alter Everything was produced by Maddie Johannsen (@MaddieJ).
Special thanks to @jesperwinkelhorst for the theme music track for this episode.