Alteryx Designer Desktop Discussions

faiqz · ‎05-26-2022

Hi, I have around 8 million data and one of the field need to be matched with another dataset that contain 21 records as the field has a misspelling. However, it took much longer that I thought. I ran it 10 hours ago and it only settle up until 818000. Do any of you know how to quicken the fuzzy match?

Thank you.

jbichachi003 · ‎05-26-2022

Hi @faiqz,

Have you tried using the AMP Engine? You can turn this feature on in the "Runtime" section of the workflow's Configuration, and it could help improve your run speed.

ddiesel · ‎05-27-2022

Hi @faiqz !

I second @jbichachi003's suggestion on the AMP engine if you aren't using that already.

I also wanted to add that I recently used some of the tips in this article to optimize one of my Fuzzy Match workflows:

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tips-and-Tricks-for-Fuzzy-Matching/...

This section has some great tips for optimizing processing time:

7. Optimizing fuzzy matching processing time:

Because fuzzy matching can require you to run your module many times, it is prudent to prep your data and save it out to a .yxdb file. Saving your data out to .yxdb files, will allow you to use the .yxdb files as an Input to your fuzzy matching module. Alteryx can read a .yxdb file faster than other file types, so this is a great place to start with optimization.
Another step in data preparation, is use the Auto Field tool, which allows Alteryx to select the most appropriate field type and length for every field in your dataset. Depending upon your input data, this can provide shocking improvements in speed.
Assuming you will be doing a merge fuzzy match, your files will require both a record ID field, and a source field; you might as well add them now.
Lastly, there is no point in bringing fields into your Fuzzy Match module that you do not need, use a Select tool to remove them now.
Finally, use your newly optimized files as .yxdb Inputs to your fuzzy matching module. –To summarize, prep data in one module, then Fuzzy Match it in another.

Also, the suggestion to "use a join to remove any exact matches from the fuzzy match process" was especially helpful to my use case.

Take a look and let us know if any of these suggestions work for you.

Thanks,
Deb

Alteryx Designer Desktop Discussions

Fuzzy Match Optimization