Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Joining 2 tables based on Address - the address is slightly different

Amalainic
6 - Meteoroid

Hello everyone, 

 

Today I have a slightly complex (for me!) task to accomplish: while using 2 public datasets which contain information about UK properties, I need to join them together based on their "Address" field. The problem is that these addresses are slightly different, for example:

 

  • "Huffington 29" vs "29 Hufington"
  • "170 Mitcham Road, London (SW17 9NJ)" vs "170 Mitcham Road, London"

 

I need to clean the data somehow first and then join the tables, or do an "approximate" join. I tried the Fuzzy Match tool but unfortunately I didn't get any results. Some ideas?

 

The sample data is attached below. I also included a Postcode column, but there are a lot of nulls so I'm not sure if that can be used properly. 

 

Thank you for your help.

2 REPLIES 2
mceleavey
17 - Castor
17 - Castor

Hi @Amalainic ,

 

If you have two datasets that don't share a common key , then you are basically looking at Fuzzy Matching, there is no other way.

The first thing to do is join them both on the postcode field, which you might need to standardise first (remove spaces, make uppercase etc.)

Then, take the output of the left and right, representing the unmatched records, and they represent source 1 and 2 in the workflow I've attached.

Fuzzy matching is not an exact science, hence fuzzy, but in the example I've attached using your two examples, you can see the matches are applied with a high degree of likelihood.

 

mceleavey_1-1620810472337.png

mceleavey_2-1620810490599.png

 

I hope this helps.

 

M.

 



Bulien

Amalainic
6 - Meteoroid

Thank you for your help. It's weird that when I use my data (even though I sampled just 5-10% of it which is 80-100k rows) the fuzzy match tool gets stuck at 51-52%. Any ideas why? I waited a lot for it to compile but no luck. 

 

You can use my sample data that I posted, the 2 excels, if you want to try.

Labels