Need to create groups of items that are similar across 3 different columns
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I have a data set with unique records in the rows and 3 columns of information per record. TO DO: I need to group the items together based on their similarity across the information in the 3 columns.
The logic for grouping the records is:
- >=70% match on "Description" column, and
- =100% match on "Size" column, and
- >= 50% match on "Label" column
As an example, my data set looks like this:
Record ID | Description | Size | Label |
1 | Dog Food Brand 1 Chicken | 0.5 | Chicken |
2 | Dog Food Brand 2 Chicken | 0.5 | Chckn |
3 | Dog Food Brand 1 Beef | 0.5 | Beef |
4 | Dog Fod Brand 2 Beef | 0.5 | Bf |
5 | Dog Food Brand 1 Chicken | 1 | Chckn |
I would want as an output is this:
- Record 1 and Record 2 are both assigned to the same unique group (this is because the description >70% match, size =100% match and label >50% match)
- Record 3 and 4 are both assigned to the same unique group (this is because the description >70% match, size = 00% match and label >50% match)
- Record 5 has no group (this is because the size column is not =100% match with record 1 and 2 and there are no other records which it should be matched with)
If anyone is able to help it would be greatly appreciated as I have tried using a Fuzzy match, but have not managed to get it right as i don't know how to create the groups based off the similarity match.
- I have attached an example of the actual data set that I am working on
- I have attached my flow
Thank you!
- Labels:
- Fuzzy Match
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Fuzzy Matching is just that -- fuzzy. I like to use the Make Group after the fuzzy match logic to group the similar terms/items/names together. To do this, you can connect a Make Group tool to the J output of your last Join tool and choose article_desc_SKU_ID1 as your first key and article_desc_SKU_ID2 as your second key. One thing to remember is that only the two fields you choose as keys in the Make Group tool are output from the tool.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello, @JTostee000.
I'm not highly adept with fuzzy matching, but I believe your requirements may be better suited to a solution in R or Python.
For example, R has a package called fuzzyjoin. Here's an example of fuzzy matching in Python.
I think giving it a shot in one or both of those programming languages may help you learn more about fuzzy matching and may give you the intended result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you Prometheus, just confirm will this only make a group for the two items that are matched with the Fuzzy match? Or will it pick up on links between those two items and other items and put them all in a group together?
For example:
- If Record 1 and Record 2 are a match
- And record 2 and 3 are a match
Will the output group Record 1, 2 and 3 in a single group as they overlap?
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@JTostee000 It will pick up on the links between those items and other items. Case in point, take a look at the example workflow for the Make Group tool. It uses references Star Wars characters and makes groups based on relationships between two or more characters, linking characters at the 2nd and 3rd degree. Here's the input. Note that only Luke has a relationship with Obi.
relationshipID | key_A | key_B |
1 | Luke | Obi |
2 | Jabba | Boba |
3 | Han | Chewy |
4 | Han | Lando |
5 | Luke | Han |
6 | Boba | Jango |
However, in the output, the Chewy group also includes Obi. Obi is connected to Luke, Luke is connected to Han, and Han is connected to Chewy.
Group | Key |
Boba | Boba |
Boba | Jabba |
Boba | Jango |
Chewy | Chewy |
Chewy | Han |
Chewy | Lando |
Chewy | Luke |
Chewy | Obi |
