This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have a table that is unidentifiable linked to other data sources except for the fact that I can reproduce a composed key that would fully match in some cases and partially match in many other cases.
I'm trying to join letters on the address field (like John Doe Wallstreet 12 1000AA Amsterdam). On the other hand I have three data sources for different types of receivers. From all three sources I've reproduced a column that follows the format pattern in [letters].[address] an union joined them together to a single source with a field [AddressType].
In my last step I want to join my [letters] and union joined table together on the address fields. I've noticed 97 exact matches but also 903 unmatched from my 1000 letter sample file. I've applied data cleansing on new lines, double white spaces and trimming whitespaces (not all)
Is there a way to fuzzy join the tables? E.g. when the addresses have a similarity up to 90% or so?
Hi @BenFransen! Let's see if I can help. I'm a little uncertain about the exact context of your situation, but I"m hopeful that the Fuzzy Match tool's Merge Mode will help you. The Fuzzy Match tool's Merge mode can fuzzy match records from 2 sources based on specified field(s) (in your case [letters]). Since the tool only has 1 input connector, you have to Union the data sources together. You'll want to add a new field on each source before the union so you can differentiate which source each record came from. The Fuzzy Match tool also requires a unique identifier for each row (i.e. drop a RecordID tool right before it). Now you can configure the Fuzzy Match tool like this:
I attached a workflow that does the prep work and the above configuration so everything is better explained. I recommend checking "Output Match Score" so you can see how strong the match is. You'll see that I have "Custom" chosen as my match style. Click "Edit" next to it to see the complex configurations you can choose for fuzzy matching. I chose Custom since you're working with just letters but one of the other default match styles may be more appropriate for you. Fuzzy Matching is an art, so I'd encourage you to play around a bit with the match styles and their embedded options.
I'd encourage you tool look at the built in Fuzzy Matching samples in the Designer (Help --> One-Tool Examples --> Join --> Fuzzy Match) and watch the videos on how fuzzy matching work here: http://www.alteryx.com/virtual-training --> Watch a Past Session. Hope this helps!
Thanks for the time you took to reply to my question. I now see I have a different expectation from the fuzzy join. I was under the impression I could join any source together and apply fuzzy matching on two columns and if a match score was above the accepted threshold it would be joined.
Now I see I have to union the results first, then do fuzzy magic, and then go on. I'll give that all a try and see how it plays out (3rd day using my Alteryx demo ;)) I'll accept your answer.