This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Could use some advice on potential approaches/solutions.
I have a fuzzy match process. It's 3300 records in an xlsx file against a YXDB of 72 million, it's been running since yesterday at 2:15 in the afternoon, and it was only through about 60% of the fuzzy match tool at 10am this morning. It got to about 50% in a couple hours, but it's taken since 6pm last night to get from 52% to 60% as of this morning, and over the last 2-3 hours it's now showing about 65% completed through the fuzzy match tool.
Both files are on my internal 😧 drive, it's a 2TB samsung 970 Evo plus with 1.5TB avail (only 400 gigs used), so memory really shouldn't be the issue.
Anyone have thoughts on this? I'd hate to restart if it can gradually plod it's way through, but not sure I can afford another 24 hrs.
My system is pretty good, if I'm just doing a 'join' between these 2 same files it typically takes about 5 minutes.
72 million records is a lot of data to fuzzy match on, even if your other file is only 3300 records. I'm not surprised it's taking this long. What are your configuration settings? Are you running purge mode or merge and what is your match threshold?
If you're running purge mode this will make the process take exponentially longer, especially with your volume of files. Your match threshold will also affect this as well. I would try splitting your yxdb file into subsets and going from there. If you're configuration is all good then there's not much else you can do. You could also try enabling the AMP engine which will enable Alteryx to use multi threading.
I don't have it running in AMP, the YXDB is an AMP yxdb, but almost every process I've tried to run in AMP has actually taken longer so mostly shy away from that except some very basic flows to read data. I'm not running in purge, I'm only doing merge. It's read all the records just fine and currently plugging through the fuzzy tool.