We're excited to announce that we'll be partnering with Credly starting October 19th - see what this means and read the announcement blog here!

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Fuzzy Matching SLOOOOOWWWWWW ..... could use advice.

Watermark
8 - Asteroid

Could use some advice on potential approaches/solutions.

 

I have a fuzzy match process. It's 3300 records in an xlsx file against a YXDB of 72 million, it's been running since yesterday at 2:15 in the afternoon, and it was only through about 60% of the fuzzy match tool at 10am this morning. It got to about 50% in a couple hours, but it's taken since 6pm last night to get from 52% to 60% as of this morning, and over the last 2-3 hours it's  now showing  about 65% completed through the fuzzy match tool. 

 

Both files are on my internal 😧 drive, it's a  2TB samsung  970 Evo plus with 1.5TB  avail (only 400 gigs used), so memory really shouldn't be the issue. 

 

Anyone have thoughts on this?  I'd hate to restart if it can gradually plod it's way through, but not sure I can afford another 24 hrs. 

 

My system is pretty good, if I'm just doing a 'join' between these 2 same files it typically takes about 5 minutes. 

 

TYIA

 

PS - 2 criteria   custom 'Name' & exact 'state'

3 REPLIES 3
drew9
8 - Asteroid

Hi @Watermark ,

 

72 million records is a lot of data to fuzzy match on, even if your other file is only 3300 records. I'm not surprised it's taking this long. What are your configuration settings? Are you running purge mode or merge and what is your match threshold? 

 

If you're running purge mode this will make the process take exponentially longer, especially with your volume of files. Your match threshold will also affect this as well. I would try splitting your yxdb file into subsets and going from there. If you're configuration is all good then there's not much else you can do. You could also try enabling the AMP engine which will enable Alteryx to use multi threading.

 

Hope this helps.

 

drew9_0-1634150083360.png

 

Watermark
8 - Asteroid

I don't have it running in AMP, the YXDB is an AMP yxdb, but almost every process I've tried to run in AMP has actually taken longer so mostly shy away from that except some very basic flows to read data. I'm not running in purge, I'm only doing merge.  It's read all the records just fine and currently plugging through the fuzzy tool. 

 

I've attached my fuzzy configuration. 

 

Watermark_1-1634151777289.png

 

 

Watermark
8 - Asteroid

Watermark_0-1634151713339.png

 

Labels