Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy Match question

m_v
8 - Asteroid

Hi everyone, looking for some fuzzy match help here.

 

The attached workflow is doing what it is supposed to do: performing a fuzzy match on a "dirty" and "clean" datasets. However, I don't understand how to tell it that the "group" should be what comes from the clean dataset. 

 

The "clean" list should be:

GLORY INC
DENNIS
JOE HOTDOG CO

 

Somehow it's all messed up. Thanks a lot for the help!

m_v_0-1680550103284.png

 

5 REPLIES 5
gautiergodard
13 - Pulsar

hey @m_v 

slight mod to your workflow to include an ID field, i believe this is what you are looking for.

gautiergodard_0-1680551344468.png

 

hope this helps!

m_v
8 - Asteroid

Thanks for taking a look at this. The answer that I'm looking for would have the "clean" names in the joined records. So it would have "GLORY INC" and not GLORY CONSTRUCTION.

m_v_0-1680551594804.png

 

ArnaldoSandoval
12 - Quasar

Hi m_v

 

I studied the Make Group tool early this week, and it puzzled me how its Group column output come to be, it happens like magic and nothing explained its magic trick 😀; Now your question forced me to do some tests, I am sharing my findings, hoping that we can understand its sorcery.

 

Group Data at a glance:

FuzzyMatch-01.png

  • I highlighted green the expected Group values; it seems the tool's group takes the group name from the sorted Key field. 😮
  • I proved this idea with your data using a copy of your workflow; the second WF in the attached file; once you run it, look at the Sort tool output.

FuzzyMatch-02.png

The sorted Company_name are here; the oranges are the groups returned by the Make Group tool, which are not right 100% of the time.

FuzzyMatch-03.png

 

  • Now, based on the newly found info, I modified your clean data on a new WF, as shown:

FuzzyMatch-04.png

I manually added the number 1 to GLORY INC and JOE HOTDOG CO, and appended a Z to MV; forcing the first two to appear first on any sort, and MV likely last.

After running the third WF with this new data, 1JOE HOTDOG CO and 1GLORY INC become the chosen Groups and MV Z was not selected.

 

Conclusions:

  1. It is my opinion the the Make Group tool select the Group Name by sorting the keys alphabetically.
  2. I only test my theory with a very small sample, your data, only 8 records.
  3. Ok, the idea is not to manually prefix with a '1' all the company names from the Clean Data data stream, that will be a lot of work, I suggest adding a formula immediately after loading the Clean Data, and strip it (the number 1) at the end of the workflow, after the final join. It is the last workflow in the attached file.

I hope you can reply back once you verify this hypothesis by using a large sample, if you do that will clarify the steps required to properly handle Groups.and Fuzzy Matches.

 

Hope this helps,

Arnaldo

 

m_v
8 - Asteroid

Hi Arnaldo, many thanks for doing this analysis.

 

It is becoming clear to me that "make group" tool does not do what I was assuming it does. 

 

To achieve my goal I really need to use a sample tool after the fuzzy match to select the best match (in case there are multiples), and then I can join it with the original dirty data. 

m_v_0-1680614462943.png

 

ArnaldoSandoval
12 - Quasar

Hi m_v

 

I noticed that in your final solution you dropped the Make Group tools, replacing it with a Sort and Sample tools, very clever approach and probably more easy to understand

 

Cheers,

Arnaldo

Labels