Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Categorical Clustering

muddobber26
8 - Asteroid

Hello - I am looking to perform a categorical clustering of qualitative data and have never done this before. I have a data set with 500K+ rows of bill of materials data where every Finished Good is mapped to each of its Subcomponents like in the example below.

 

Finished GoodComponent
5S4Y56-9A
5S4Y559-0Y
5S4Y14-56-AB
56-SY4-956-9A
56-SY4-9559-0Y

 

What I am looking to do is to identify "similar groupings of finished goods" based on the Components they are tied to. 

 

Any advice for what type of clustering algorithm in Alteryx I should use?

 

Thanks

7 REPLIES 7
mceleavey
17 - Castor
17 - Castor

Hi @muddobber26 ,

 

Can you provide some representative data with an example of what you're trying to achieve?

 

Thanks,

 

M.



Bulien

muddobber26
8 - Asteroid

Yes. See below. Here is the input data and what I am trying to achieve is "groupings of similar parts". So in this example an example output would be 2 different "groupings". Group 1 would be Finished Goods: 5S4Y and 56-SY4-9. Group 2 would be Finished Goods: 45-TU-B and 49-TUV-05. There are a few different variables in play though because not every Finished Good has the same number of components. Some could have 3 and some could have 10. So I would want to output a list of Grouped Finished goods that have some % similarity in terms of actual components and number of components. Does that make sense? 

 

Finished GoodComponent
5S4Y56-9A
5S4Y559-0Y
5S4Y14-56-AB
56-SY4-956-9A
56-SY4-9559-0Y
45-TU-B23-IN
45-TU-BAB-678
45-TU-B451-BO
49-TUV-0523-IN
49-TUV-05AB-678
49-TUV-05452-BV
ArtApa
Alteryx
Alteryx

@muddobber26 - Here is how your solution may look like:

ArtApa_0-1674768892050.png

 

muddobber26
8 - Asteroid

Thanks @ArtApa . Really appreciate it. Is there a way to modify this so that the output could be more than 2 finished goods in each row? I currently have a workflow setup that identifies Finished Goods that are similar to one another which is what your output is. My problem is I am trying to create larger groupings instead of just 1 to 1 comparisons. I want to bubble up those 1:1 comparisons into groupings of finished goods where every finished good in the grouping is ~80% similar to everything else. Thus, why I was probing at doing groupings. Any thoughts or ideas to that end?

ArtApa
Alteryx
Alteryx

Hi @muddobber26 - Can you please share a desired output example? 

muddobber26
8 - Asteroid

Yes. See below. Each finished good would be tagged to a "group" which would be made up of individual Finished Goods with similar (similar perhaps being some % threshold) combination of components AND similar NUMBER of components. Note that some finished goods could have a higher NUMBER of components than other finished goods i.e., FG A could have 25 components and Finished Good B could have 100 components. Even though all of Finish Good A's components are also tagged to Finished Good B, there are 75 Finished Good B components that would NOT be similar which means the Finished Goods should not be tagged to the same similarity group. Does that make sense?

Similarity GroupingFinished GoodComponent
15S4Y56-9A
15S4Y559-0Y
15S4Y14-56-AB
156-SY4-956-9A
156-SY4-9559-0Y
245-TU-B23-IN
245-TU-BAB-678
245-TU-B456TG
249-TUV-0523-IN
249-TUV-05AB-678
249-TUV-05452-BV
1456TG56-9A
1456TG559-0Y
1456TG14-56-AB
1456TG15TG14
2410-U-B23-IN
2410-U-BAB-678
2410-U-B456TKLO
2410-U-B452-BV
muddobber26
8 - Asteroid

@ArtApa Alternatively, I was able to run a workflow that spit out a % similarity between two Finished Goods based on their Components and then I filtered for % similarities greater than 80%. So that's the first step. But now I want to "cluster" these Finished Goods to create the similarity groupings I previously finished.

 

So if I have a list like the one below which shows the % similarity between FG 1 and FG 2, is there a way to run a clustering analysis or some kind of "grouping" analysis to put all FGs that have 80% or greater similarity into the same group? I tried just continuous joining but that didn't seem the most efficient way. And the other problem with that method is that if FG A is 80% similar to FG B and FG B is 80% similar to FG C there is a chance that FG C is only 60% similar to FG A so I wouldn't want them in the same group. Thoughts? Thanks for any help you can provide.

 

GroupingFinished GoodComponent% Similarity
15S4Y56T6881
156T6854GB82
15S4Y8U-9091
2TGB-76TGB-7998
2TGB-79NBV-187
Labels