Missed the Q4 Fall Release Product Update? Watch the on-demand webinar for more info on the latest in Designer 24.2, Auto Insights Magic Reports, and more!
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Can’t Figure Out Fuzzy Match Tool Workflow at All

slaurel
8 - Asteroid

Hi, I have been attempting to use the fuzzy match tool for something, and I have had no luck. I tried to use RegEx formulas but apparently Fuzzy Match is the only tool in Alteryx that will return values that have close matches, such as 80% match. 

 

Here are my details:

- I have an Excel (xslx) file that has two tabs. The first tab is called First Copy and it has 2 columns ("Names" and "ID"). The second tab is called Organization. In that tab, I have 3 columns ("Level", "Organization Name", "Date"). "Names" from the first tab and "Organization Name" from the second tab have the exact same information, but some of the wording is slightly different for some of the cells. Some are already exact matches. In the second tab, "Level" is the column that has the parent/child entity numbering. For example, here is what column A & B look like in the second tab:

 

 

1Marketplace LLC
2Engine Limited
3Gallery
3Moon
3Star
3Dynamic
2Chair
3Email
3Dataset
3Custom Tools
4Fuzzy

 

 - "1" means Parent entity. "2" means child entity, "3" etc means child of that child. So in this example, "Marketplace LLC" is the overall parent entity. "Chair" is a child of Marketplace LLC that has its own children, such as Email, Dataset, and Custom Tools. Then, Custom tools has its own child - Fuzzy. And there is a huge list of these.

 

Here is where I am getting stuck / what the ask is:

 

-In the first tab in my file, there is no "Level" column that links these together. So, I've been working on a small workflow. It needs to have some sort of join/fuzzy match that compares Column A from the first tab and Column B from the second tab (so just the names). If there is no close match, it returns a blank. At the end, I want it to return 2 columns (Org Names and First Copy). The parent entity with all of it's children and the children's children underneath should be returned. They are already in order in the file. There are also some cases where an entity is in the first tab but not in the second tab. That is when it should return blank in the column of the tab is wasn't found in. For example, if "Vision LLC" was found in the first tab but not the second, it should return "Vision LLC" in First Copy column and "Not found" or a null in the Org Names column. 

 

- My Alteryx workflow so far is two inputs (for each tab), data cleansing tools on each. I tried various methods (appending/unioning/joining/etc) and then attaching the fuzzy match tool, but it never works. The tool itself is hard for me to navigate as well. Any help here would be greatly appreciated, thank you. 

5 REPLIES 5
CoG
14 - Magnetar

The Fuzzy Match Tool is the tool you'll want to use, and Union is how you'll need to combine the data in order to use the tool. Prior to doing that however, use the Formula Tool to add a column [SourceID], set to 1 for first tab and 2 for second tab to use "Merge Mode" in Fuzzy Match Tool. It can also be beneficial to add [RecordID] just before Fuzzy Match Tool so that you can identify exact position of matches. You can check "Output Unmatched Records" to see records that didn't match, although it will sometimes seemingly break (I don't know why it does this) and output matched records as unmatched too. It may be better to determine unmatched records after the fact because of this weird behavior.

 

NOTE: Make sure that the columns you will be matching on have the same name before doing the Union, so that all match data is in the same column.

Screenshot.png

Screenshot 2.png

If you share some sample data for Tab 1 and 2, I'd be happy to try to throw together a sample workflow for you based on what I understand of the problem so far.

slaurel
8 - Asteroid

@CoG Sure!

 

This is a sample of tab 1 (this tab does not have the entities in order with their children. It is completely random, but for this purpose I went ahead and made them in order):

NamesID
Seeds IncP0123
BearP0465
ForestFC3747
Grass S.A. - CropsFW385
Trees inc. A.SJF7749
LeavesWE99
Stump Ltd450PQ
School LLCFM01211
Rooms465732
BusesGP4577
Students incJ1234
  
  
  
  

 

Tab 2 (this one is already in order always):

 

LevelOrganization Name
1Seeds Inc
2Bear
3Forest
3Grass S.A. - Crops
2Trees inc. A.S
3Leaves Limited
1School LLC
2Rooms 
3Buses
4Students inc

 

 

The output should look like this (2 columns). "Leaves" is an example where the wording is slightly different. It is "Leaves" in the first tab but "Leaves Limited" in the second tab. 

 

First Copy Names (first tab results)Organization Names (second tab)
  
Seeds Inc 
BearBear
ForestForest
Grass S.A. - CropsGrass S.A. - Crops
Trees inc. A.STrees inc. A.S
LeavesLeaves Limited
Stump ltdNOT FOUND 
School LLCSchool LLC
RoomsRooms
BusesBuses
Students inc

Students inc

CoG
14 - Magnetar

Here is a sample workflow that I build for you. The difficulty with the fuzzy match tool is that its fuzzy and may make inappropriate associations or fail to make an association. Thus it is important to check the output, but this workflow and the Fuzzy Match Tool can help speed up the association process.

 

_Main.png

CoG
14 - Magnetar

Per direct request, here is a modified version of the workflow that takes into account the initial ordering of Tab 2 data.

 

This was accomplished by modifying the second chained join to include the RecordID followed by the addition of a Sort Tool to achieve the desired effect.

 

Let us know if this works! Happy Solving!!!

CoG
14 - Magnetar

I'm not quite sure what the error you got was. Here is another variant of the workflow. If this still doesn't work if you could provide some more information that would be helpful. When I ran the workflow, the output I got was in the same order as Tab 2 except for unmatched records from Tab 1 which are just Union-ed on to the rest of the data:

Screenshot.png

Labels
Top Solution Authors