community
cancel
Showing results for 
Search instead for 
Did you mean: 

Engine Works Blog

Under the hood of Alteryx: tips, tricks and how-tos.
Alteryx Alumni (Retired)

Is Fuzzy Matching a mystery to you? You're not alone. Do you wish it were easier? Of course--who doesn't like things to be easier? Let's learn why and how we use this powerful methodology, and meet a new friend that makes it easier than ever: the Fuzzy Unique Macro!

 

First, in case your parents never shared this one with you, here's the cliffnotes on the story of the dawn of Fuzzy Matching, born in the advanced analytics group at the rapidly-growing Super Positron Inert Thrusters (SPIT), Inc. of West Han Plains, Kashyyyk.

 

So if you're familiar with conditions on the Wookie home planet, then you know that the thick vegetation on Kashyyyk meant cellular coverage was spotty at best. This combined with the mostly-vowel dialect of West Han meant the phone operators at SPIT had difficulty entering correct customer information. The telecoms eventually upgraded to 91g, but not before Mr. Aacaa Woeegee of 129 Raandoo Meadows found himself in the SPIT CRM system 11 different times!

 

Aacaa had been a loyal customer for years, buying all his positron thrusters from SPIT (he was there back when they were still Super Colliding Universe Machines). But every time Aacaa phoned into SPIT, the operator would invariably mishear--and hence misspell--at least part of his info.

 

No big deal? Well, let's see...

 

Every different spelling meant a virtual 'individual' customer in their CRM. And Aacaa wasn't the only one; this problem was rampant all across SPIT's system. SPIT marketing couldn't identify any of their best customers to reward their loyalty, and SPIT management was foaming at the mouth, (which, to be fair, is somewhat common with Wookies.) The problem was so detrimental to SPIT's ability to perform business analytics that they developed a process--a set of complex algorithms--to determine who among these slightly varying entries in their CRM are actually the same individual. When SPIT was done, they found that each West Han Plains resident was entered in their database an average of 3.8 times! Not only did they shrink the size of their bloated CRM and now it's running much faster, but more importantly, they could now identify who their customers really are!

 

These powerful techniques spread across galaxies and years, known on Earth today as "Fuzzy Matching"--the "fuzzy" being an historical homage to the fur-covered giants of both height and analytics that brought us this knowledge. Unfortunately, what was lost somewhere in space and time was the way in which SPIT made fuzzy matching so famously easy. Lost, that is, until Alteryx.

 

Yep, Alteryx--like the Wookie--is famous for making wickedly complex processes ridiculously easy, so in that vein:

 

Introducing the Fuzzy Unique Macro! The Fuzzy Unique Macro is Alteryx's best approximation of the original, legendarily-simple-yet-epically-powerful 'Wookie Purge Mode Fuzzy Match'. Let's have a look! Download the demo module attached at the bottom of this blog entry so you can play along.

 

If you've used Alteryx for more than a few minutes then you already love the Unique tool for deduping clean data; the Fuzzy Unique macro lets you dedupe people--or Wookies, Droids and Sarlaccs--as easily as the regular Unique tool, but with the power of approximation.

Take a look at Aacaa's 11 different entries in the SPIT CRM. Notice the slight variations in spelling on the name and address columns:

 

 

Let's run this data set through both the regular Unique tool and the Fuzzy Unique macro and see what happens. Here are the 2 tool configurations side-by-side: not much different; equally easy. Instead of checking 4 boxes in the Unique tool at left, you pick the same 4 columns from dropdowns in the Fuzzy Unique on the right. That's it!

 

 

So while the interfaces are very similar, the results are anything but. Those slight variations in spelling you see mean that--to the Unique tool--you have 11 different records, and as such all 11 entries come out the 'U' side of the Unique tool.

 

 

But no need to feel as if millions of voices cried out in terror and then were suddenly silenced. Instead, your Fuzzy Unique macro did exactly what you wanted: it analyzed the records and saw that--although not perfectly identical--these 11 records represented the same Wookie, and hence just 1 record comes out the 'U' Unique side of the Fuzzy Unique macro; the other 10 have been identified as virtual duplicates and hence flow out the 'D' Duplicate side of the macro.

 

Furthermore, when you look at both Fuzzy Unique outputs, you see the FuzzyPersonID column. This ID column identifies a unique individual, and since all 11 records are for the same individual, all of them--the unique row and the duplicate rows--are assigned the same FuzzyPersonID. This ID can then be used to subsequently capture and consolidate all the data that's spread among the disparate records. When complete, you'll know all about Aacaa, and you'll see he's the kind of loyal customer you want to take good care of.

 

How does it work? Well here's the best part about macros: Who Cares!! Like your phone, your remote control, or your positron thruster, it doesn’t matter how a macro actually works; you’re just happy it does! All the complexity of the component tools is wrapped up and packaged so you don’t need to see or understand the internals. Isn't that the best?

 

Almost. In truth, that's really the second best part about macros.

 

What's really the best part about macros? It's that you can see exactly how they work. They're macros! That means it's like any other Alteryx workflow--a set of tools combined on a canvas--but it's all encapsulated for simplicity and reuse. So when you do get curious to find out how a macro works, just right-click and open it up!

 

Not everyone knows--or cares to know--exactly how the built-in, ultra-powerful and ultra-configurable Fuzzy Match tool works, or how it ties in with the Make Group tool and the other best practices recommended for fuzzy matching. You can open the macro and learn the detailed tool-by-tool setup, or just use it in comfy blissful ignorance--one of my very favorite places to be.

 

So how 'bout it? Go try the Fuzzy Unique macro on your own database and see how easy it is to clean it up and start treating your customers as the great individuals they are.

 

Stay tuned for more Fuzzy Match magic in the upcoming blog on the Fuzzy Join macro, or what folks around here call it: "The Fuzzy Join Macro."

Comments

Hi Dan,

WOW!!!

Spoiler
Spoiler
Smiley Very Happy

This is an insane macro - i've just right clicked and had a look under the bonnet (hood) and it's crazily complicated, but absolutely brilliant.

The fantastic thing is that i didn't have to do that, i could have just pressed go and it would have given me the output i needed.

i was just a bit curious.

Thanks for creating this macro - it might just be a lifesaver as i will be fuzzy matching businesses across different companies within my company.  The business has grown through acquisition with lots of different brands and operating systems and field names.  So matching is a real challenge.  This looks like it will make it a breeze.

You've even added descriptions on what is happening in each tool - superb addition.

 

Labels