Dear all,


I have a question related to the tool I should use in Alteryx when it comes to match big data. I have a source file of 4M+ records and a target one of 2K. I need to get information from the source data to update the target one. The only way to match the 2 is by Company name. As you may imagine the company names can have differences related to punctuation ...


When I use the Join tool I have something like 10% target data updated.


When I use fuzzy match it never ends.


Your input is much appreciated.




I would recommend you to use the find replace tool with the append fields to record set up.


This should help to get more.


Also I would set a cleaning tool to remove al punctuation and a previous find replace to switch all misspellings and abbreviations.



Fuzzy match is very likely to never end unless you use a "waterfall" method....


Set your match criteria (either high or low thresholds, depending on your methodology), do the fuzzy match and set aside records that have a match.


Then take the unmatched records, change the match thresholds, and run the fuzzy match again.


Keep incrementally changing the thresholds. Once you've got a satisfactory match percentage, you can union all the outputs from prior fuzzy matches.


I hope that makes sense. I got the waterfall technique from this training video:


The suggestion to start with a low threshold came from another solutions engineer who recommended it so that you're not going through successive iterations only to find that you made the cutoff at 65% but 64% is really the magic number.

