Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
patrick_mcauliffe
14 - Magnetar
14 - Magnetar

I remember the first time I installed Alteryx at work. I was so excited after running the trial on my computer at home, but when it came time to put my newly acquired license to use, it wasn’t the same experience. Sure, it was fast; but nothing like it was on my own computer. Almost like I was watching it in slow motion.

 

Rightfully, I assumed it was the hardware. The IT department was kind enough (after months of negotiating) to allow me to have a desktop in addition to the laptop I already had. They called it a “development” machine because it had *8* GB of RAM and a sweet AMD dual core processor. Note the sarcasm. The year was 2014, quad core was pretty much the standard, and 8 GB of RAM wasn’t anything special. I tried to explain that on my machine at home this application ran much faster (four core and 32 GB of RAM), but there were costs associated with a “special order” and I couldn’t really quantify the cost to benefit ratio of the expense.

 

Thankfully after a few months of working that machine to the max, the hard drive just gave up. It didn’t take much longer for the next one to fail too. That’s the thing about hard disk drives – they’re a mechanical spinning disk prone to failure under long periods of heavy use. This particular machine was running 24/7. Constant use, plus poor ventilation and heat = faster failure rate.

 

So, now that I’ve had a few years of working with Alteryx and been lucky enough to experience it on a wide variety of machinery I can tell you without a doubt that hardware matters, a lot.

 

For the sake of demonstration, I got ahold of the file that was the subject of my first use case for Alteryx. Well, not the exact file but the current version. It’s a CSV containing all of the identifiers for healthcare providers in the US (NPPES Downloadable File). In its zipped format, it's only about 600 MB, but unzipped it is greater than 6 Gb. At 329 columns wide and 5.8 million rows long it was well over the limit of most self-service tools’ ability. However, Alteryx handled it without a problem.

 

Anyway, to recreate the difference in experience I had back in 2014, I got ahold of a laptop from a friend who works for a not-so-tech-savvy company. A Dell Latitude E5570 to be exact. It was the closest I could find to an elderly piece of equipment. While it does meet the “high performance” specs outlined by Alteryx (four core CPU, 16 GB of RAM and 500 GB of space on its primary drive) it really doesn’t give a high-performance experience.

 

The workflow I ran back in 2014 took the extracted CSV file, counted the number of unique identifiers per city/state and joined the count to each line’s state address. Pretty simple – three tools. Four if you count the Browse. When we ran it on his laptop it took 2 minutes and 49.667 seconds to complete.

 

So…slow.So…slow.

 

Next up, we have a four core, 32 GB of RAM, and 500 GB on the primary drive. But this time the primary drive is an SSD.

 

It crushed this same workflow in just 1 minute and 4 seconds. More than two and a half times as fast. You could argue that the RAM was part of that (sorry I couldn’t take this one apart and make them even), but we ran a few more tests just to compare the impact of RAM.

 

To study the impact of RAM and further prove the need for better hardware, we go to my daily driver (home computer). It’s 8 core, 64 GB of RAM, and 1 TB solid state (NVMe) connected via PCIe in a Raid 0 configuration where the OS resides. And just to be safe, we have a Raid 10 cluster of data center SSDs for data and file storage.

 

My rig.My rig.

 

For this test we built a virtual machine, storing it on the Raid 10. It has four cores, 8 GB of RAM and a 200 GB drive for the operating system disk. All well below the specs of the last two machines and doesn’t meet the high-performance spec.

 

It could handle the same workflow in almost exactly 40 seconds. That’s more than four times faster.

 

Then, to really make a point I set up another virtual machine. This time just two cores and 4 GB of RAM. It gave impressive results, finishing up at 53.8 seconds. Still at least three times faster than the laptop with high-performance specs.

 

  CPU CPU Clock CPU Cores Primary Drive Primary Drive Type Primary Drive Interface Secondary Drive RAM WF Run Time
Machine A Intel i5-6440HQ 3.5 Ghz 4 500GB Hard Disk Drive (7200 RPM) SATA   16 GB 2:49
Machine B Intel Xeon E3-1271v3 3.6 Ghz 4 512 GB Solid State Drive SATA   32 GB 1:04
Machine C Intel i9-9900K 3.6 Ghz (OC @ 4.81 Ghz) 8 1 TB NVMe (Raid 0) PCIe/M.2 (Raid 0)   64 GB 0:36
Machine D (Virtual) Intel i9-9900K (virtual) 3.6 Ghz 4 200 GB NVMe (Raid 0) PCIe/M.2 (virtual SCSI) Solid State - SATA - Raid 10 8 GB 0:40
Machine D (Virtual) Intel i9-9900K (virtual) 3.6 Ghz 2 200 GB NVMe (Raid 0) PCIe/M.2 (virtual SCSI) Solid State - SATA - Raid 10 4 GB 0:53

 

Here’s why it matters:

 

The most commonly used interface between a hard drive and motherboard is called SATA III (serial AT attachment revision three). [*Fun fact: “AT” stands for advanced technology and was a term used by IBM for their second generation of personal computers (PC/AT).] SATA is a set of standard specifications; the third generation of which requires data be able to move at 6 Gbit/s. Sounds fast, right?  That’s only in theory though. In reality, the speed at which the CPU can process incoming data/return processed data and the speed at which the drive can send/receive data are just as important.

 

Going back to the laptop; the hard disk drive uses a SATA III interface (the full 6Gbit/s). However, when we run a benchmark (test of actual performance) the drive can only read and write data at around 100 MB/s.

 

The second machine, with an SSD for the primary drive, is also using a SATA III interface. But, because of the SSD its benchmark was around 480 Mb/s.

 

And finally, the third machine (and its virtual machine). This one uses PCIe (M.2 to be exact) instead of SATA to interface the solid state for the OS and the motherboard. This version of PCIe has a theoretical transfer rate of 985 Mb/s per lane with four lanes, giving it a total theoretical read/write of 3,940 Mb/s. When we ran a benchmark from inside the virtual machine it was surprisingly good at roughly 1,200 Mb/s. And of course, directly on the machine itself (not the VM) was the best at around 2,700 Mb/s.

 

There were more tests done than just this, but to spare you the narrative I’ve summarized my recommendations...

 

Recommendations

 

CPUs

 

  • Bigger is always better. Always.
  • Avoid CPUs advertised as “low energy” or “energy efficient.” These are most commonly found in laptops and are meant to restrict processing power to extend battery life. Always look at the model number’s suffix to find these:
    • Intel's list can be found here
    • AMD isn’t as obvious but you can use the Watt rating (TDP) to get an idea
  • If you’re using a virtual machine, Intel or AMD might be better based on other hardware factors. It’s hard to put a guideline around this. I’ve always had better luck with Intel though.

 

RAM

 

This is another “bigger is better” situation. However, if the rest of your hardware isn’t at the same level of performance it won’t matter.

 

Memory clock cycles (speed) and latency are the other factors to consider here. Clock cycles are the number of times each read and write can be done per second. It usually looks like this: 3200Mhz. Latency is a series of timings that indicate the delay between the RAM receiving a command and being able to use it. They’ll usually be listed like this: 16-18-18-36 or C16. There isn’t a specific rule about how to pair these up and there are dozens of possible configurations. It is also highly dependent on the motherboard and CPU specifications for it to be utilized 100%.

 

Drives

 

Avoid hard disk drives at all costs. As shown above, it will cripple your work. If your boss or IT department needs any further validation of that, point them to this.

 

Relative speeds of the same workflow:

 

  • Hard disk drive: 1 Workflow in 2:49
  • Solid state drive: 1 Workflow in 1:04
  • M2: 1 Workflow in 0:40

 

This is NOT a situation where bigger is better, but also there isn’t a single size recommendation. It’s a ratio of size to data I/O performance that just depends on what you’re using. Most people won’t recognize the difference between a 500 GB and 1 TB, but a 500 GB to 4 TB would be noticeable if you’re having to retrieve data from that drive.


If you ever find yourself in the position of being able to custom build a machine and can’t decide on which combination of parts, check out userbenchmark.com. You can benchmark your own machine and then compare/build a better one from other users’ results.

 

Finally, don’t forget that an adequate cooling system is just as important!

Comments
SeanAdams
17 - Castor
17 - Castor

Great article @patrick_mcauliffe - fully agree with your conclusion that IO speed is more important than size on disks; and IO is also more important than processor speed on most computers.

 

Also curious to learn more about the M.2 raid 0 setup for your primary & boot; and the M.2 raid 10 array - is this all in your home PC.   If so, keen for some of your build tips on what hardware to use.

patrick_mcauliffe
14 - Magnetar
14 - Magnetar

Thanks @SeanAdams .  Yes, this is my home PC. 

It's been a long time hobby that has just gotten slightly out of control.

 

I've also rebuilt since this article.  There was a release of Win10 that had some type of conflict with the controller I was using on the RAID 10 array.  Performance dropped and certain file operations would give a BSOD.  After a few trials of custom drivers and modifications, I finally just gave up and replaced it with a SAS HBA.  That's working fairly well, but I'll probably make some changes again soon.

 

Specific hardware specs are constantly changing.  Here are some random topics that stick out as I think through previous builds:

 

The RAID 0 for a primary drive does give some truly awesome performance for certain operations, but it's not really worthwhile for Alteryx alone (as compared to a standalone or RAID 1 using M2 for the OS). However, some form of M2 (RAID or not) for the OS is always worthwhile in my opinion.
If your motherboard doesn't have a built in M2 slot, it is possible to add on with a PCIe card. Unless you know what you're doing in the BIOS to set it as the boot drive (or are ready to try), I'd avoid it.

As it pertains to what hardware to use - it depends on what all of your use cases are for the machine and your total budget.

It also makes a difference if you want to start by building totally new all at once or add on over time.

https://pcpartpicker.com/ is a great way to keep track of which parts are compatible as you build.

If you don't know where to start and just want ideas for what types of parts go together, check out what other users have built at https://www.userbenchmark.com/PCBuilder and compare to what you have now.

I try to swap parts frequently by purchasing higher end equipment second hand and re-selling before the market for that spec bottoms out. For example, DDR3 memory dropped in price when DDR4 spec was released to the market but there was a lag between when DDR4 took the second price drop and when DDR3 was phased out of new machines - that's the time to sell.

There are usually a good number of sellers on eBay which are turning over parts from companies that cycle their hardware regularly.

 

If you're starting all new - 

Start with the processor type you want (or have) and work down. 

 

For CPU, higher core count to runs more processes at once and higher speed runs individual processes faster. Find the balance of those that fits your budget and use.

 

Use the best performing drive for the files you use the most or need to access the fastest (like the OS).  Right now, that's NVMe in the M2/PCIe format.


You can still use an HDD for your less frequently used files by mapping the HDD volume to a folder within Windows (so it all appears to be part of the C drive) or create separate named volume.

If you're not building a server, using a motherboard in a workstation product line usually has the best performance.  Those cost a bit more, so if your budget is less look at the gaming product lines.

In my experience, when your motherboard has M2 slots there's usually a trade off with certain SATA ports being disabled for utilizing some or all of your M2. If that leaves you with too few SATA ports for the RAID configuration/required storage then you have a few options when it comes to expansion (PCIe add-on).
If you're going to use software to set up a RAID or similar array (like MS Storage Pool), make sure your expansion card has the option to function JBOD (just a bunch of disks).
For drive expansion, you can use a SATA expansion card, RAID controller, SAS controller/HBA, etc (there are some more very specific specialty products as well).
Generally you just need a SATA expansion card. RAID and SAS controllers have additional functions over normal SATA expansion and can usually connect more storage drives with different connectors. They also require additional work to setup with their own BIOS and firmware, while a SATA expansion card is typically plug and play.

 

SeanAdams
17 - Castor
17 - Castor

Thank you for the detailed response @patrick_mcauliffe  - looking forward to researching these SAS controller /HBA that you mentioned to see if this would be an option in my rig.

 

Thank you also for the tips - both on how to source hardware more cheaply; and also the power of M.2 / NVME drives to speed up the OS drive.

Have a good new-year!

hellyars
13 - Pulsar

@patrick_mcauliffe    I currently run Alteryx on an iMac Pro.  Virtual machines are a pain and PDF processing (Intel Suite) is appallingly slow.  So,  I need a PC.  It will be used primarily for Alteryx -- and a little gaming.  Relative to Alteryx, is there a point of diminishing returns relating to single core performance, multi-core performance, and/or memory (RAM)?   

patrick_mcauliffe
14 - Magnetar
14 - Magnetar

@hellyars as with all things - it depends. 
And with a vm on a Mac (in the few times I've had access to one) it seemed to require a lot of overhead to run the vm.

What hardware do you have on the Mac?

hellyars
13 - Pulsar

@patrick_mcauliffe   iMacPro, 3GHz 10-Core Intel Xeon W, 64GB DDR4 ECC RAM, 1TB NVMe,  Vega64 Pro 16GB.   Windows VM gets 5-6 cores (10-12 threads) and 32GB of RAM.  

patrick_mcauliffe
14 - Magnetar
14 - Magnetar

@hellyars That would perform pretty well as a PC.

If you build it yourself, there's not really a point of diminishing returns (assuming a high budget).  Its when you buy a pre-built that locks you into certain hardware options where you don't get to pick the trade offs.

Taking your current hardware; let's assume it was a PC and think about what we would change.

 

NVMe - solid choice.  My optimal though would be to have a second NVMe drive. Separate out the working directory from the files by drive.  This ensures they go through different paths on their way to the CPU.

 

64 GB of ECC Ram - that will handle large memory consumption in a single or multiple concurrent workflows well.  ECC is great to have.  I wish it were in more widespread use, but its obviously limited by cost.

 

Xeon is a solid architecture for CPU.  Its just a workhorse when it comes to running multiple applications/workflows concurrently.

If there was something I'd change about that though, I'd get one with a higher single core performance or an unlocked model.  For example, the Xeon W-3175X.  Sure, it's clocked at 3.1 and that's not a big jump over your current 3 Ghz; but it has Turbo boost up to 4.5 and all cores can be overclocked around 5.

The higher clock speed is going to make a difference in your single-threaded processes like tools using R and Python (pdf extraction).

hellyars
13 - Pulsar

@patrick_mcauliffe  What's your thought regarding AMD Ryzen and Threadripper CPUs.  I am looking at Ryzen 9 5950X (16 cores, 32 threads, 3.4GHz base clock, 5.0GHz boost clock, 105W TDP), 64GB RAM (maybe not ECC), and 2X M.2 NVMe Gen 4 drives (1 boot/apps, 1 data). 

patrick_mcauliffe
14 - Magnetar
14 - Magnetar

@hellyars on paper the R9-5950X  looks very good.

  I don't have enough recent experience with AMD products to provide worthwhile insight on the CPU.

The rest of your setup looks pretty solid.

 

It is more costly, but when possible, I try to keep data on a redundant drive.

For those, I use a minimum of RAID1 but ideally a RAID10.

(or the RAID-like config you can do with MS Work Spaces)

Right now, I use one of these:

https://www.broadcom.com/products/storage/host-bus-adapters/sas-nvme-9500-16i

Connected to a set of SAS drives and NVMe using this approach to tier performance:

https://nils.schimmelmann.us/post/153541254987/intel-smart-response-technology-vs-windows-10

 

Enterprise surplus SAS drives are fairly cheap on eBay these days. 

Or if you're more patient, try keeping an eye on: https://www.reddit.com/r/hardwareswap/

 

 

 

 

kaja1000000
5 - Atom

Thank you for this post Patrick! I feel like there is a solution of my problem in it, but due to my limited hardware knowlegde I still haven't manage to identify it...

So here is my situation (if you have any tip I would be extremely grateful):

We are trying to set up an AWS VM that would run our Alteryx Workflows: millions rows of data with a lot of Join/Union tools. Currently we are running these workflows on my ThinkPad P1 Gen 3 Laptop: Intel®Core i7-10875H, RAM 64GB, 2TB disk (kxg6apnv2t04 toshiba), and it takes around 6h to run the longest workflow. 
However, I was not able to find a e2c instance that would preform as good as my laptop (trying not to exceed 2$/h), and I tested few of them by using the workflow you mentioned ( at this moment the zip file is around 800MB) :

 

My Laptop: 56s . I set up 16313Megabytes Memory Limit because I noticed that by increasing it or decreasing the runtime is getting worse actually (even thhough I have 64GB RAM)
c5.4xlarge (16vCPU, 32RAM, storage:1000 GiB general purpose SSD): 1min 11s 

r5ad.2xlarge (8vCPU, 64RAM, storage:1 SSD NVMe x 300GiB)  : 1min 40s

r5d.4xlarge (16vCPU, 128RAM, storage: 2 SSD NVMe x 300GiB) : 1min 20s


It seems to me that processors in these VMs are just not good enough, for example my processor's Max Turbo Frequency is 5.10 GHz. 

Do you have any suggestion about which e2c I should try next? 

Thank you in advance,
Katarina

patrick_mcauliffe
14 - Magnetar
14 - Magnetar

Hey @kaja1000000 

It sounds like you've run into the catch that comes along with using cloud computing when you need performance.

The CPU's generally used to back cloud instances aren't intended to outpace the single core speed of a desktop. 

Simply put, they're intended to run more processes in parallel, not accelerate a single process.

 

With that thought in mind though, have you tried running the workflow on your laptop with the Amp engine enabled in Alteryx?

 

There are, however, more considerations than just CPU.

Where does the data come from ?  Where does it go to?

If your data originates in the cloud, then you might get better performance simple because there is less of a bottleneck getting the data on those instances that have a 10G+ network connection.

If your data is already local, then I'd look at changing your workflow.

There's always more than one way to get things done.

Amp, batching, order of operations - these are all things to look at.

 

If you had to go with an EC2 instance though,  here's how I would maximize performance (added in the stuff you already have so others can reference) :

Check out the specs vs cost (Google "ec2info")

Figure out what fits into what you can spend.

 

NVMe is preferred over  SSD instance storage, although SSD isn't bad.  Just be sure not to pick an HDD or EBS.

 

I'd agree with the 8-16 vCPU you're already looking at.  Its generally a good cost/performance range on cloud right now.

 

Then check out the CPU model numbers: (Google ec2/instance-types) 

 

Given that Alteryx in general and the operations of using sorts and joins tend to highly use memory, you may want to check out those first.

 

At the time of this posting, the z1d has a sustained all core frequency of 4.0 GHz - not too bad, but not always cost effective for everyone.  Also, there's no model that will work best on all workflows.

I've had some run better on the R's than the z's and vice versa.

It's just a matter of figuring out the right combination.

 

 

 

 

 

 

 

kaja1000000
5 - Atom

Thank you so much Patrick! I've just tested the z1d.3xlarge (12vCPU, 96GB, 450 NVMe SSD) and it ran the workflow in 1min02s! I think we have a winner!

I tested the AMP engine before and it's not working for my workflow, I guess beceause I have multiple inputs , outputs and parallel processes inside the same workflow. But yeah I might try to optimize that.

Thank you so much again, this was really really helpful!

patrick_mcauliffe
14 - Magnetar
14 - Magnetar

Hey @kaja1000000  that is awesome to hear!