Hi everyone, looking for some fuzzy match help here.
The attached workflow is doing what it is supposed to do: performing a fuzzy match on a "dirty" and "clean" datasets. However, I don't understand how to tell it that the "group" should be what comes from the clean dataset.
The "clean" list should be:
GLORY INC
DENNIS
JOE HOTDOG CO
Somehow it's all messed up. Thanks a lot for the help!
Solved! Go to Solution.
hey @m_v
slight mod to your workflow to include an ID field, i believe this is what you are looking for.
hope this helps!
Thanks for taking a look at this. The answer that I'm looking for would have the "clean" names in the joined records. So it would have "GLORY INC" and not GLORY CONSTRUCTION.
Hi m_v
I studied the Make Group tool early this week, and it puzzled me how its Group column output come to be, it happens like magic and nothing explained its magic trick 😀; Now your question forced me to do some tests, I am sharing my findings, hoping that we can understand its sorcery.
Group Data at a glance:
The sorted Company_name are here; the oranges are the groups returned by the Make Group tool, which are not right 100% of the time.
I manually added the number 1 to GLORY INC and JOE HOTDOG CO, and appended a Z to MV; forcing the first two to appear first on any sort, and MV likely last.
After running the third WF with this new data, 1JOE HOTDOG CO and 1GLORY INC become the chosen Groups and MV Z was not selected.
Conclusions:
I hope you can reply back once you verify this hypothesis by using a large sample, if you do that will clarify the steps required to properly handle Groups.and Fuzzy Matches.
Hope this helps,
Arnaldo
Hi Arnaldo, many thanks for doing this analysis.
It is becoming clear to me that "make group" tool does not do what I was assuming it does.
To achieve my goal I really need to use a sample tool after the fuzzy match to select the best match (in case there are multiples), and then I can join it with the original dirty data.
Hi m_v
I noticed that in your final solution you dropped the Make Group tools, replacing it with a Sort and Sample tools, very clever approach and probably more easy to understand
Cheers,
Arnaldo
User | Count |
---|---|
19 | |
14 | |
13 | |
9 | |
8 |