Dear all,
I have a question related to the tool I should use in Alteryx when it comes to match big data. I have a source file of 4M+ records and a target one of 2K. I need to get information from the source data to update the target one. The only way to match the 2 is by Company name. As you may imagine the company names can have differences related to punctuation ...
When I use the Join tool I have something like 10% target data updated.
When I use fuzzy match it never ends.
Your input is much appreciated.
Thanks,
Andy
Solved! Go to Solution.
I would recommend you to use the find replace tool with the append fields to record set up.
This should help to get more.
Also I would set a cleaning tool to remove al punctuation and a previous find replace to switch all misspellings and abbreviations.
Cheers
Fuzzy match is very likely to never end unless you use a "waterfall" method....
Set your match criteria (either high or low thresholds, depending on your methodology), do the fuzzy match and set aside records that have a match.
Then take the unmatched records, change the match thresholds, and run the fuzzy match again.
Keep incrementally changing the thresholds. Once you've got a satisfactory match percentage, you can union all the outputs from prior fuzzy matches.
I hope that makes sense. I got the waterfall technique from this training video: https://community.alteryx.com/t5/Live-Training/Live-Training-Fuzzy-Matching-Intermediate-Users/td-p/...
The suggestion to start with a low threshold came from another solutions engineer who recommended it so that you're not going through successive iterations only to find that you made the cutoff at 65% but 64% is really the magic number.