Fuzzy Match temp file size




Is there any way to calculate the disk space required for a fuzzy match temp file?  I have a fuzzy match running with 12,000 records in 1 dataset and 1,000,000 in the other and at 78% complete it has hit 100GB.  I only have 100gb available for the temp file at the moment.  Rather than guess for the next attempt, I wondered if there is a way to calculate it?





Hi @alexisjensen,


Not sure that there is a way to calculate disk space prior to executing the workflow. However, you do have some options to optimize and better understand your workflow...


- Convert your input files to .yxdb as this file type is highly indexed with the Alteryx Engine

- Ensure data is prepped beforehand and if working with addresses, leverage the CASS functionality

- Always start with pre-configured match styles

- Join your data on exact matches before trying to run a Fuzzy Match

- Enable Performance Profiling: this option can be found by clicking a white area on the canvas, navigating to Runtime in the Configuration pane, and then selecting the check mark for 'Enable Performance Profiling.' This option willow allow you to see a milliseconds and percentage breakdown per tool in your workflow.

- There's also a great article on optimizing your workflow found here


Also something to keep in mind - this is an excellent use case to leverage Alteryx Server and is likely your best bet. With Alteryx Server, you can offload some of that heavy lifting at the desktop level to a server machine. You will also be able to schedule this process as well.