Hi, fairly new here,
I have a flow with several thousand company names. I want to eliminate duplicates. I do an exact match to start.
Then, I've passed them through some fuzzy match logic. This first threshold is narrow and outputs ~50 fuzzy matched company names.
I then pass them through some more fuzzy match logic. This second threshold is wider and outputs ~100 separate fuzzy matched names.
I then need to match them to an internal data set of company names. Again, more fuzzy match logic and the narrow threshold.
And then pass them through again at the wider threshold.
I want to manually verify each of these fuzzy matches before continuing on with the next step of the process.
With my current setup, I was going to:
However I would need to do this multiple times for each new iteration/threshold of fuzzy match.
Is there anyway to manually validate rows mid stream (whether through an analytics app or some magic tool)?
Grateful for any insights or reccomendations.
Thanks,
Noah
For this case, it would be better to process all the data at once, than to repeat stop-and-go in the middle.
To evaluate how different two strings are, "edit distance" would come in handy, as you can quantify the difference and deal it as numbers.
For this purpose, I would use "Fuzzy Match" tool.
Please take a look at: Tool Mastery | Fuzzy Match.