Alteryx Designer Desktop Discussions

muddobber26 · ‎01-25-2023

Hello - I am looking to perform a categorical clustering of qualitative data and have never done this before. I have a data set with 500K+ rows of bill of materials data where every Finished Good is mapped to each of its Subcomponents like in the example below.

Finished Good	Component
5S4Y	56-9A
5S4Y	559-0Y
5S4Y	14-56-AB
56-SY4-9	56-9A
56-SY4-9	559-0Y

What I am looking to do is to identify "similar groupings of finished goods" based on the Components they are tied to.

Any advice for what type of clustering algorithm in Alteryx I should use?

Thanks

mceleavey · ‎01-26-2023

Hi @muddobber26 ,

Can you provide some representative data with an example of what you're trying to achieve?

Thanks,

M.

muddobber26 · ‎01-26-2023

Yes. See below. Here is the input data and what I am trying to achieve is "groupings of similar parts". So in this example an example output would be 2 different "groupings". Group 1 would be Finished Goods: 5S4Y and 56-SY4-9. Group 2 would be Finished Goods: 45-TU-B and 49-TUV-05. There are a few different variables in play though because not every Finished Good has the same number of components. Some could have 3 and some could have 10. So I would want to output a list of Grouped Finished goods that have some % similarity in terms of actual components and number of components. Does that make sense?

Finished Good	Component
5S4Y	56-9A
5S4Y	559-0Y
5S4Y	14-56-AB
56-SY4-9	56-9A
56-SY4-9	559-0Y
45-TU-B	23-IN
45-TU-B	AB-678
45-TU-B	451-BO
49-TUV-05	23-IN
49-TUV-05	AB-678
49-TUV-05	452-BV

ArtApa · ‎01-26-2023

@muddobber26 - Here is how your solution may look like:

muddobber26 · ‎01-28-2023

Thanks @ArtApa . Really appreciate it. Is there a way to modify this so that the output could be more than 2 finished goods in each row? I currently have a workflow setup that identifies Finished Goods that are similar to one another which is what your output is. My problem is I am trying to create larger groupings instead of just 1 to 1 comparisons. I want to bubble up those 1:1 comparisons into groupings of finished goods where every finished good in the grouping is ~80% similar to everything else. Thus, why I was probing at doing groupings. Any thoughts or ideas to that end?

ArtApa · ‎01-28-2023

Hi @muddobber26 - Can you please share a desired output example?

muddobber26 · ‎01-28-2023

Yes. See below. Each finished good would be tagged to a "group" which would be made up of individual Finished Goods with similar (similar perhaps being some % threshold) combination of components AND similar NUMBER of components. Note that some finished goods could have a higher NUMBER of components than other finished goods i.e., FG A could have 25 components and Finished Good B could have 100 components. Even though all of Finish Good A's components are also tagged to Finished Good B, there are 75 Finished Good B components that would NOT be similar which means the Finished Goods should not be tagged to the same similarity group. Does that make sense?

Similarity Grouping	Finished Good	Component
1	5S4Y	56-9A
1	5S4Y	559-0Y
1	5S4Y	14-56-AB
1	56-SY4-9	56-9A
1	56-SY4-9	559-0Y
2	45-TU-B	23-IN
2	45-TU-B	AB-678
2	45-TU-B	456TG
2	49-TUV-05	23-IN
2	49-TUV-05	AB-678
2	49-TUV-05	452-BV
1	456TG	56-9A
1	456TG	559-0Y
1	456TG	14-56-AB
1	456TG	15TG14
2	410-U-B	23-IN
2	410-U-B	AB-678
2	410-U-B	456TKLO
2	410-U-B	452-BV

muddobber26 · ‎01-29-2023

@ArtApa Alternatively, I was able to run a workflow that spit out a % similarity between two Finished Goods based on their Components and then I filtered for % similarities greater than 80%. So that's the first step. But now I want to "cluster" these Finished Goods to create the similarity groupings I previously finished.

So if I have a list like the one below which shows the % similarity between FG 1 and FG 2, is there a way to run a clustering analysis or some kind of "grouping" analysis to put all FGs that have 80% or greater similarity into the same group? I tried just continuous joining but that didn't seem the most efficient way. And the other problem with that method is that if FG A is 80% similar to FG B and FG B is 80% similar to FG C there is a chance that FG C is only 60% similar to FG A so I wouldn't want them in the same group. Thoughts? Thanks for any help you can provide.

Grouping	Finished Good	Component	% Similarity
1	5S4Y	56T68	81
1	56T68	54GB	82
1	5S4Y	8U-90	91
2	TGB-76	TGB-79	98
2	TGB-79	NBV-1	87

Alteryx Designer Desktop Discussions

Categorical Clustering