Categorical Clustering
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello - I am looking to perform a categorical clustering of qualitative data and have never done this before. I have a data set with 500K+ rows of bill of materials data where every Finished Good is mapped to each of its Subcomponents like in the example below.
Finished Good | Component |
5S4Y | 56-9A |
5S4Y | 559-0Y |
5S4Y | 14-56-AB |
56-SY4-9 | 56-9A |
56-SY4-9 | 559-0Y |
What I am looking to do is to identify "similar groupings of finished goods" based on the Components they are tied to.
Any advice for what type of clustering algorithm in Alteryx I should use?
Thanks
- Labels:
- Data Investigation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @muddobber26 ,
Can you provide some representative data with an example of what you're trying to achieve?
Thanks,
M.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Yes. See below. Here is the input data and what I am trying to achieve is "groupings of similar parts". So in this example an example output would be 2 different "groupings". Group 1 would be Finished Goods: 5S4Y and 56-SY4-9. Group 2 would be Finished Goods: 45-TU-B and 49-TUV-05. There are a few different variables in play though because not every Finished Good has the same number of components. Some could have 3 and some could have 10. So I would want to output a list of Grouped Finished goods that have some % similarity in terms of actual components and number of components. Does that make sense?
Finished Good | Component |
5S4Y | 56-9A |
5S4Y | 559-0Y |
5S4Y | 14-56-AB |
56-SY4-9 | 56-9A |
56-SY4-9 | 559-0Y |
45-TU-B | 23-IN |
45-TU-B | AB-678 |
45-TU-B | 451-BO |
49-TUV-05 | 23-IN |
49-TUV-05 | AB-678 |
49-TUV-05 | 452-BV |
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks @ArtApa . Really appreciate it. Is there a way to modify this so that the output could be more than 2 finished goods in each row? I currently have a workflow setup that identifies Finished Goods that are similar to one another which is what your output is. My problem is I am trying to create larger groupings instead of just 1 to 1 comparisons. I want to bubble up those 1:1 comparisons into groupings of finished goods where every finished good in the grouping is ~80% similar to everything else. Thus, why I was probing at doing groupings. Any thoughts or ideas to that end?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @muddobber26 - Can you please share a desired output example?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Yes. See below. Each finished good would be tagged to a "group" which would be made up of individual Finished Goods with similar (similar perhaps being some % threshold) combination of components AND similar NUMBER of components. Note that some finished goods could have a higher NUMBER of components than other finished goods i.e., FG A could have 25 components and Finished Good B could have 100 components. Even though all of Finish Good A's components are also tagged to Finished Good B, there are 75 Finished Good B components that would NOT be similar which means the Finished Goods should not be tagged to the same similarity group. Does that make sense?
Similarity Grouping | Finished Good | Component |
1 | 5S4Y | 56-9A |
1 | 5S4Y | 559-0Y |
1 | 5S4Y | 14-56-AB |
1 | 56-SY4-9 | 56-9A |
1 | 56-SY4-9 | 559-0Y |
2 | 45-TU-B | 23-IN |
2 | 45-TU-B | AB-678 |
2 | 45-TU-B | 456TG |
2 | 49-TUV-05 | 23-IN |
2 | 49-TUV-05 | AB-678 |
2 | 49-TUV-05 | 452-BV |
1 | 456TG | 56-9A |
1 | 456TG | 559-0Y |
1 | 456TG | 14-56-AB |
1 | 456TG | 15TG14 |
2 | 410-U-B | 23-IN |
2 | 410-U-B | AB-678 |
2 | 410-U-B | 456TKLO |
2 | 410-U-B | 452-BV |
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@ArtApa Alternatively, I was able to run a workflow that spit out a % similarity between two Finished Goods based on their Components and then I filtered for % similarities greater than 80%. So that's the first step. But now I want to "cluster" these Finished Goods to create the similarity groupings I previously finished.
So if I have a list like the one below which shows the % similarity between FG 1 and FG 2, is there a way to run a clustering analysis or some kind of "grouping" analysis to put all FGs that have 80% or greater similarity into the same group? I tried just continuous joining but that didn't seem the most efficient way. And the other problem with that method is that if FG A is 80% similar to FG B and FG B is 80% similar to FG C there is a chance that FG C is only 60% similar to FG A so I wouldn't want them in the same group. Thoughts? Thanks for any help you can provide.
Grouping | Finished Good | Component | % Similarity |
1 | 5S4Y | 56T68 | 81 |
1 | 56T68 | 54GB | 82 |
1 | 5S4Y | 8U-90 | 91 |
2 | TGB-76 | TGB-79 | 98 |
2 | TGB-79 | NBV-1 | 87 |
