Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Need to create groups of items that are similar across 3 different columns

JTostee000
5 - Atom

Hi,

I have a data set with unique records in the rows and 3 columns of information per record. TO DO: I need to group the items together based on their similarity across the information in the 3 columns.

The logic for grouping the records is:
-    >=70% match on "Description" column, and

-    =100% match on "Size" column, and

-    >= 50% match on "Label" column


As an example, my data set looks like this:

 

Record IDDescriptionSizeLabel
1Dog Food Brand 1 Chicken0.5Chicken
2Dog Food Brand 2 Chicken0.5Chckn
3Dog Food Brand 1 Beef0.5Beef
4Dog Fod Brand 2 Beef0.5Bf
5Dog Food Brand 1 Chicken1Chckn

 

I would want as an output is this:
- Record 1 and Record 2 are both assigned to the same unique group (this is because the description >70% match, size =100% match and label >50% match)

- Record 3 and 4 are both assigned to the same unique group (this is because the description >70% match, size = 00% match and label >50% match)
- Record 5 has no group (this is because the size column is not =100% match with record 1 and 2 and there are no other records which it should be matched with)

If anyone is able to help it would be greatly appreciated as I have tried using a Fuzzy match, but have not managed to get it right as i don't know how to create the groups based off the similarity match.

- I have attached an example of the actual data set that I am working on

- I have attached my flow

Thank you!

4 REPLIES 4
Prometheus
12 - Quasar

Fuzzy Matching is just that -- fuzzy. I like to use the Make Group after the fuzzy match logic to group the similar terms/items/names together. To do this, you can connect a Make Group tool to the J output of your last Join tool and choose article_desc_SKU_ID1 as your first key and article_desc_SKU_ID2 as your second key. One thing to remember is that only the two fields you choose as keys in the Make Group tool are output from the tool.

acarter881
12 - Quasar

Hello, @JTostee000.

 

I'm not highly adept with fuzzy matching, but I believe your requirements may be better suited to a solution in R or Python.

 

For example, R has a package called fuzzyjoin. Here's an example of fuzzy matching in Python.

 

I think giving it a shot in one or both of those programming languages may help you learn more about fuzzy matching and may give you the intended result.

 

 

JTostee000
5 - Atom

Thank you Prometheus, just confirm will this only make a group for the two items that are matched with the Fuzzy match? Or will it pick up on links between those two items and other items and put them all in a group together?


For example:
- If Record 1 and Record 2 are a match

- And record 2 and 3 are a match

 

Will the output group Record 1, 2 and 3 in a single group as they overlap?

Thank you

Prometheus
12 - Quasar

@JTostee000 It will pick up on the links between those items and other items. Case in point, take a look at the example workflow for the Make Group tool. It uses references Star Wars characters and makes groups based on relationships between two or more characters, linking characters at the 2nd and 3rd degree. Here's the input. Note that only Luke has a relationship with Obi.

relationshipIDkey_Akey_B
1LukeObi
2JabbaBoba
3HanChewy
4HanLando
5LukeHan
6BobaJango

 

However, in the output, the Chewy group also includes Obi. Obi is connected to Luke, Luke is connected to Han, and Han is connected to Chewy. 

GroupKey
BobaBoba
BobaJabba
BobaJango
ChewyChewy
ChewyHan
ChewyLando
ChewyLuke
ChewyObi
Labels