I just have a basic question as I'm learning the Fuzzy Match tool. I was playing around with the Sample workflow within Alteryx.
1) I can't get my head around why the output is duplicating rows? In this case WOLFMAN CO pops up twice. In the original dataset, it's only in there once. But it also has two different match keys. Is that why it outputs twice?
2) Also, is it standard to put your Master List that you're comparing your working dataset it in the bottom of the Union tool? How does the Fuzzy match tool distinguish between the Master List (the standardized list) and the Raw List that needs to be cleaned?
Thanks for everyone's help.
Solved! Go to Solution.
Hi @whitkrieng
Both great questions regarding the Fuzzy Match tool, answers are below with an example workflow for the second question:
1. Yes, depending on the fidelity of the matches and how keys are configured to be output, some relationships will appear twice because two separate match keys have caused the values to be linked. This is why it's a good idea to standardise the output of the Fuzzy Match with a Make Group tool - as in the "Fuzzy Match > Make Group > Find Replace" example included with Alteryx Designer - or to sort by match score and then unique on Field1 and Field2.
2. Yes, depending on the aim of the Fuzzy Match you can use the tool to standardise values across different data sources via 'Merge' mode. By unioning your values and ensuring that there is a column defining the source of the data, you can prevent the Fuzzy Match tool from matching against itself by selecting 'Merge (Only Records from a Different Source are Compared)' and then selecting your source column as an identifier. The attached workflow shows how you can prep your Master list and Raw data for Fuzzy Matching.
Hope this helps!
Thank you for your response!
Another question about the multiple keys, why is the Fuzzy generating multiple keys?
So for Wolfman there is a MatchKey of AFLM and FLFM. So there can be different variations of "Wolfman"?
Thanks.
Correct - based on the key generation style chosen, multiple keys can be generated for each record to cross-check and attempt to get any matches that might be possible. This is also why there tends to be a very similar string of tools across just about every fuzzy match, where the match itself is often followed by a Unique tool to remove any of those duplicated matches across keys.
 
					
				
				
			
		
