Hi, I have been attempting to use the fuzzy match tool for something, and I have had no luck. I tried to use RegEx formulas but apparently Fuzzy Match is the only tool in Alteryx that will return values that have close matches, such as 80% match.
Here are my details:
- I have an Excel (xslx) file that has two tabs. The first tab is called First Copy and it has 2 columns ("Names" and "ID"). The second tab is called Organization. In that tab, I have 3 columns ("Level", "Organization Name", "Date"). "Names" from the first tab and "Organization Name" from the second tab have the exact same information, but some of the wording is slightly different for some of the cells. Some are already exact matches. In the second tab, "Level" is the column that has the parent/child entity numbering. For example, here is what column A & B look like in the second tab:
1 | Marketplace LLC |
2 | Engine Limited |
3 | Gallery |
3 | Moon |
3 | Star |
3 | Dynamic |
2 | Chair |
3 | |
3 | Dataset |
3 | Custom Tools |
4 | Fuzzy |
- "1" means Parent entity. "2" means child entity, "3" etc means child of that child. So in this example, "Marketplace LLC" is the overall parent entity. "Chair" is a child of Marketplace LLC that has its own children, such as Email, Dataset, and Custom Tools. Then, Custom tools has its own child - Fuzzy. And there is a huge list of these.
Here is where I am getting stuck / what the ask is:
-In the first tab in my file, there is no "Level" column that links these together. So, I've been working on a small workflow. It needs to have some sort of join/fuzzy match that compares Column A from the first tab and Column B from the second tab (so just the names). If there is no close match, it returns a blank. At the end, I want it to return 2 columns (Org Names and First Copy). The parent entity with all of it's children and the children's children underneath should be returned. They are already in order in the file. There are also some cases where an entity is in the first tab but not in the second tab. That is when it should return blank in the column of the tab is wasn't found in. For example, if "Vision LLC" was found in the first tab but not the second, it should return "Vision LLC" in First Copy column and "Not found" or a null in the Org Names column.
- My Alteryx workflow so far is two inputs (for each tab), data cleansing tools on each. I tried various methods (appending/unioning/joining/etc) and then attaching the fuzzy match tool, but it never works. The tool itself is hard for me to navigate as well. Any help here would be greatly appreciated, thank you.
The Fuzzy Match Tool is the tool you'll want to use, and Union is how you'll need to combine the data in order to use the tool. Prior to doing that however, use the Formula Tool to add a column [SourceID], set to 1 for first tab and 2 for second tab to use "Merge Mode" in Fuzzy Match Tool. It can also be beneficial to add [RecordID] just before Fuzzy Match Tool so that you can identify exact position of matches. You can check "Output Unmatched Records" to see records that didn't match, although it will sometimes seemingly break (I don't know why it does this) and output matched records as unmatched too. It may be better to determine unmatched records after the fact because of this weird behavior.
NOTE: Make sure that the columns you will be matching on have the same name before doing the Union, so that all match data is in the same column.
If you share some sample data for Tab 1 and 2, I'd be happy to try to throw together a sample workflow for you based on what I understand of the problem so far.
@CoG Sure!
This is a sample of tab 1 (this tab does not have the entities in order with their children. It is completely random, but for this purpose I went ahead and made them in order):
Names | ID |
Seeds Inc | P0123 |
Bear | P0465 |
Forest | FC3747 |
Grass S.A. - Crops | FW385 |
Trees inc. A.S | JF7749 |
Leaves | WE99 |
Stump Ltd | 450PQ |
School LLC | FM01211 |
Rooms | 465732 |
Buses | GP4577 |
Students inc | J1234 |
Tab 2 (this one is already in order always):
Level | Organization Name |
1 | Seeds Inc |
2 | Bear |
3 | Forest |
3 | Grass S.A. - Crops |
2 | Trees inc. A.S |
3 | Leaves Limited |
1 | School LLC |
2 | Rooms |
3 | Buses |
4 | Students inc |
The output should look like this (2 columns). "Leaves" is an example where the wording is slightly different. It is "Leaves" in the first tab but "Leaves Limited" in the second tab.
First Copy Names (first tab results) | Organization Names (second tab) |
Seeds Inc | |
Bear | Bear |
Forest | Forest |
Grass S.A. - Crops | Grass S.A. - Crops |
Trees inc. A.S | Trees inc. A.S |
Leaves | Leaves Limited |
Stump ltd | NOT FOUND |
School LLC | School LLC |
Rooms | Rooms |
Buses | Buses |
Students inc | Students inc |
Here is a sample workflow that I build for you. The difficulty with the fuzzy match tool is that its fuzzy and may make inappropriate associations or fail to make an association. Thus it is important to check the output, but this workflow and the Fuzzy Match Tool can help speed up the association process.
Per direct request, here is a modified version of the workflow that takes into account the initial ordering of Tab 2 data.
This was accomplished by modifying the second chained join to include the RecordID followed by the addition of a Sort Tool to achieve the desired effect.
Let us know if this works! Happy Solving!!!
I'm not quite sure what the error you got was. Here is another variant of the workflow. If this still doesn't work if you could provide some more information that would be helpful. When I ran the workflow, the output I got was in the same order as Tab 2 except for unmatched records from Tab 1 which are just Union-ed on to the rest of the data: