Important Community update: The process for changing your account details was updated on June 25th. Learn how this impacts your Community experience and the actions we suggest you take to secure your account here.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Comparing 2 different files (same field)

mshinn9999
7 - Meteor

Hello all.  I am trying to figure out the best approach for this.  I have File 1 (with a single field Name) with say 300K records. Then, I have another file which sources from another application with the same Name field (but has only about 1K records).  I am trying to understand how likely there would be close matches in the Name field between the 2 separate files. 

Is the Fuzzy Match tool (MERGE) the best approach here?  Or, is there something a bit easier tool to leverage here?  I did not want to over-complicate this. 

Mike

11 REPLIES 11
jdminton
12 - Quasar

I would say possibly. The fuzzy match tool can be very helpful. but you will often get false positives even if everything is set up correctly. Why are you considering fuzzy match to straight join? I'm guessing there are probably formatting differences or possibly variation in user input, etc.

mshinn9999
7 - Meteor

Hello there.  To add more color here, I will need to create a calculation based on the 2 different files (data sources).  TBH, I would like to avoid the Fuzzy Match all together.  I can use the combination of the Data Cleaning/Formula tools to have the data format be quite similar.  I guess I'm trying to see if there is something similar to say a Vlookup to get a sense of what percentage of likely matches there are between the 2 files.  I hope this makes sense.  It's a data profiling exercise.  

OllieClarke
15 - Aurora
15 - Aurora

@mshinn9999 

If you're comparing a single string field between 2 datasets, then the Find and Replace tool can be useful here. It has more flexibility than a join tool (can find partial matches, or case insensitive ones), and if you use it in append mode, then a browse tool after showing the % of non-nulls in the appended column will tell you what % found a match.

jdminton
12 - Quasar

1000 records is getting a bit large for find/replace in my opinion. I would recommend using Uppercase() to remove any case issues with name. It sounds like the names should be pretty similar. The join tool will provide any fallout and you can use a fuzzy match on the fallout from both sides to identify possible issues if desired.

mshinn9999
7 - Meteor

Thank you both for the replies. Attached is sample workbook.  How would this look like per implementation?  Data of course is all dummied up.

 

Mike

 

OllieClarke
15 - Aurora
15 - Aurora

@jdminton on my machine with 300k/1000 records the Find and replace is still significantly faster than the formula+join combo, plus it has partial match capabilities.

image.png

There will be a tipping point where that might no longer be true though

OllieClarke
15 - Aurora
15 - Aurora

Hi @mshinn9999 

could you possibly share the excel your workflow references too please?

mshinn9999
7 - Meteor

Yes of course and thank you for the assistance here.  Here are all of the relevant files.

OllieClarke
15 - Aurora
15 - Aurora

@mshinn9999 here's how the Find and Replace tool could work (either at the end of your data prep, or even earlier on)

image.png

Labels