This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Is there any way to calculate the disk space required for a fuzzy match temp file? I have a fuzzy match running with 12,000 records in 1 dataset and 1,000,000 in the other and at 78% complete it has hit 100GB. I only have 100gb available for the temp file at the moment. Rather than guess for the next attempt, I wondered if there is a way to calculate it?
Not sure that there is a way to calculate disk space prior to executing the workflow. However, you do have some options to optimize and better understand your workflow...
- Convert your input files to .yxdb as this file type is highly indexed with the Alteryx Engine
- Ensure data is prepped beforehand and if working with addresses, leverage the CASS functionality
- Always start with pre-configured match styles
- Join your data on exact matches before trying to run a Fuzzy Match
- Enable Performance Profiling: this option can be found by clicking a white area on the canvas, navigating to Runtime in the Configuration pane, and then selecting the check mark for 'Enable Performance Profiling.' This option willow allow you to see a milliseconds and percentage breakdown per tool in your workflow.
Also something to keep in mind - this is an excellent use case to leverage Alteryx Server and is likely your best bet. With Alteryx Server, you can offload some of that heavy lifting at the desktop level to a server machine. You will also be able to schedule this process as well.