Hello,
I´m trying to find a way to compare multiple entities from a data set to ensure that they match. For example, if I have:
Alteryx Company |
Alteryx Company LLC |
Alteryx Company Limited Corporation |
I want to be able to perform a fuzzy match to find out discrepancies and so forth which I know how to do it. The problem is that it takes too long, hours & hours long, as the data set is over 50 thousand records. I think I found a turnaround and it is to separate each record by sorting them into their first name letter. Which I did and it showed me the following results using the summary tool. So, I thought about filtering the data set into 38 different outputs without manually dragging 38 filter tools into the data set. I´ve asked copilot and it tells me that a batch macro can help me but I´ve not yet become an advance user. Can someone please help me? Or advise me a better way to tackle in the problem?
Thank you!
First Name Letter | Record Count |
H | 2335 |
S | 5481 |
2 | 33 |
G | 2432 |
: | 1 |
14 | |
Z | 225 |
4 | 12 |
P | 3036 |
 | 1 |
M | 3772 |
U | 992 |
J | 1080 |
F | 2373 |
Q | 185 |
I | 1524 |
Y | 219 |
E | 1952 |
6 | 3 |
1 | 37 |
K | 1039 |
L | 2177 |
W | 2093 |
V | 1006 |
O | 1339 |
7 | 4 |
9 | 2 |
8 | 5 |
B | 4301 |
R | 2095 |
A | 4514 |
5 | 7 |
T | 2480 |
C | 6272 |
N | 2365 |
X | 70 |
D | 1750 |
3 | 26 |
Solved! Go to Solution.
@Mzacr updated workflow attached, i used your above input for the first input text tool
You might want to look at some other techniques as well. Fuzzy Match is a rather iterative process as you want the algorithms to get as many matches as possible without a false positive.
Take a look at the sample under "Help > Sample Workflows > Scripting and Automation > Build a Macro > Merge to a master file with Fuzzy Match". It gives a good run-down of how you match what you can and then match more.
So, in your list of 57k, you want to match what you can and reduce that list size. Once you figure out matches, start adding them to a master list so that you don't need to match them again.
You´re amazing! Thank you very much for your quick turnaround!
Understood! Yes, the idea itself is that I have only one data set, from a source. So, I want to ensure that there are no fishy business in the records. So one test was to ensure that similar records don´t have duplicate values and such. I´ll give a try, I believe that sample, follows a cascade fashion, that would be very helpful indeed, the only thing I had a hard time cracking where the embedded macros hahaha.