In case you missed the announcement: Alteryx One is here, and so is the Spring Release! Learn more about these new and exciting releases here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Categorical Clustering

muddobber26
8 - Asteroid

Hello - I am looking to perform a categorical clustering of qualitative data and have never done this before. I have a data set with 500K+ rows of bill of materials data where every Finished Good is mapped to each of its Subcomponents like in the example below.

 

Finished GoodComponent
5S4Y56-9A
5S4Y559-0Y
5S4Y14-56-AB
56-SY4-956-9A
56-SY4-9559-0Y

 

What I am looking to do is to identify "similar groupings of finished goods" based on the Components they are tied to. 

 

Any advice for what type of clustering algorithm in Alteryx I should use?

 

Thanks

7 REPLIES 7
mceleavey
17 - Castor
17 - Castor

Hi @muddobber26 ,

 

Can you provide some representative data with an example of what you're trying to achieve?

 

Thanks,

 

M.



Bulien

muddobber26
8 - Asteroid

Yes. See below. Here is the input data and what I am trying to achieve is "groupings of similar parts". So in this example an example output would be 2 different "groupings". Group 1 would be Finished Goods: 5S4Y and 56-SY4-9. Group 2 would be Finished Goods: 45-TU-B and 49-TUV-05. There are a few different variables in play though because not every Finished Good has the same number of components. Some could have 3 and some could have 10. So I would want to output a list of Grouped Finished goods that have some % similarity in terms of actual components and number of components. Does that make sense? 

 

Finished GoodComponent
5S4Y56-9A
5S4Y559-0Y
5S4Y14-56-AB
56-SY4-956-9A
56-SY4-9559-0Y
45-TU-B23-IN
45-TU-BAB-678
45-TU-B451-BO
49-TUV-0523-IN
49-TUV-05AB-678
49-TUV-05452-BV
ArtApa
Alteryx
Alteryx

@muddobber26 - Here is how your solution may look like:

ArtApa_0-1674768892050.png

 

muddobber26
8 - Asteroid

Thanks @ArtApa . Really appreciate it. Is there a way to modify this so that the output could be more than 2 finished goods in each row? I currently have a workflow setup that identifies Finished Goods that are similar to one another which is what your output is. My problem is I am trying to create larger groupings instead of just 1 to 1 comparisons. I want to bubble up those 1:1 comparisons into groupings of finished goods where every finished good in the grouping is ~80% similar to everything else. Thus, why I was probing at doing groupings. Any thoughts or ideas to that end?

ArtApa
Alteryx
Alteryx

Hi @muddobber26 - Can you please share a desired output example? 

muddobber26
8 - Asteroid

Yes. See below. Each finished good would be tagged to a "group" which would be made up of individual Finished Goods with similar (similar perhaps being some % threshold) combination of components AND similar NUMBER of components. Note that some finished goods could have a higher NUMBER of components than other finished goods i.e., FG A could have 25 components and Finished Good B could have 100 components. Even though all of Finish Good A's components are also tagged to Finished Good B, there are 75 Finished Good B components that would NOT be similar which means the Finished Goods should not be tagged to the same similarity group. Does that make sense?

Similarity GroupingFinished GoodComponent
15S4Y56-9A
15S4Y559-0Y
15S4Y14-56-AB
156-SY4-956-9A
156-SY4-9559-0Y
245-TU-B23-IN
245-TU-BAB-678
245-TU-B456TG
249-TUV-0523-IN
249-TUV-05AB-678
249-TUV-05452-BV
1456TG56-9A
1456TG559-0Y
1456TG14-56-AB
1456TG15TG14
2410-U-B23-IN
2410-U-BAB-678
2410-U-B456TKLO
2410-U-B452-BV
muddobber26
8 - Asteroid

@ArtApa Alternatively, I was able to run a workflow that spit out a % similarity between two Finished Goods based on their Components and then I filtered for % similarities greater than 80%. So that's the first step. But now I want to "cluster" these Finished Goods to create the similarity groupings I previously finished.

 

So if I have a list like the one below which shows the % similarity between FG 1 and FG 2, is there a way to run a clustering analysis or some kind of "grouping" analysis to put all FGs that have 80% or greater similarity into the same group? I tried just continuous joining but that didn't seem the most efficient way. And the other problem with that method is that if FG A is 80% similar to FG B and FG B is 80% similar to FG C there is a chance that FG C is only 60% similar to FG A so I wouldn't want them in the same group. Thoughts? Thanks for any help you can provide.

 

GroupingFinished GoodComponent% Similarity
15S4Y56T6881
156T6854GB82
15S4Y8U-9091
2TGB-76TGB-7998
2TGB-79NBV-187
Labels
Top Solution Authors