Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Component Similarity Analysis

Alterisk
5 - Atom

Hey all, I'm trying to run a component similarity analysis between Finished Goods (FG) using Alteryx but unsure how to implement this.


Input - List of Finished Goods with their corresponding components

 

Alterisk_0-1672899341545.png

 

Expected Output - Matrix with similarity % for each finished good combination:

 

Alterisk_1-1672899624787.png

 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

My approach for the solution is as follows:

1. Build a single list of unique FGs with a concatenated list of corresponding components in each row:

 

Alterisk_2-1672899801229.png

 

2. Build a matrix with the input and the concat list and search for each component in the corresponding concatenated list for each FG combination. If found, enter 1 else enter 0:

 

Alterisk_3-1672899889713.png

 

3. Sum up the values for each combination (eg. FG1-FG1, FG1-FG2, FG1-FG3, and so on):

 

Alterisk_4-1672899942357.png

 

4. Build a new matrix which represents the count of distinct components for each FG combination (for e.g. FG1 has components 1 and 2, while FG2 has components 1,3,4. So the resulting FG combination FG1-FG2 will have 4 distinct components (1,2,3,4):

 

Alterisk_5-1672900018050.png

 

5. Divide the matrix obtained in step 3 by the matrix from step 4, thereby giving the similarity matrix

 

Alterisk_6-1672900152107.png

 

 

 

Please let me know if there is a way to implement this in Alteryx, or if there's a better way to get to this output. 
Thanks in advance for all your help! 

 

 

 

 

5 REPLIES 5
FinnCharlton
13 - Pulsar

Hi Alterisk,

Here is a possible solution. It basically follows your method: forming all possible combinations, calculating matches and distinct combinations, then calculating percentages.

FinnCharlton_0-1672916222420.png

FinnCharlton_1-1672916233195.png

Hope this solves your problem.

 

Alterisk
5 - Atom

Thank you @FinnCharlton, that was really helpful and does the job!
However, is there a way to optimize the workflow? I have an input dataset that has over 100k rows (~36k unique FGs, and ~24k unique components), so the Cartesian Join (and subsequent joins) would consume a tremendous amount of memory / storage. It would be helpful if you could also recommend an alternative to this.
Thanks again!

FinnCharlton
13 - Pulsar

Here is an optimised version. As the output is symmetrical (i.e. the combination of FG1/FG2 is the same as FG2/FG1) we only need to calculate a subset of combinations. We also don't need to calculate the combination of an FG with itself, as this will always be 100%. This new workflow only generates the combinations we need.

FinnCharlton_1-1673000191158.png

If you compare the output of this workflow to the one above, you will see that these values are the only important ones - everything else can be inferred. 

 

I think the nature of this analysis means it will always be computationally intensive, but this should help a bit.

 

 

Alterisk
5 - Atom

Thanks for the quick reply on this! The logic does make absolute sense, but unfortunately the FG and component names are strings and not necessarily sequential either (e.g. "FG-XYZ-0A2", "FG-2ABC",etc.), so numerical comparison in the loop will not work.
Is there still an alternative to this solution, by any chance? 

FinnCharlton
13 - Pulsar

I would solve this by assigning a numerical ID to each FG Name and usinng that for the workflow. If you need some help attach some sample data and I will show you what I mean.

Labels