Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Component Similarity Analysis

Alterisk
5 - Atom

Hey all, I'm trying to run a component similarity analysis between Finished Goods (FG) using Alteryx but unsure how to implement this.


Input - List of Finished Goods with their corresponding components

 

Alterisk_0-1672899341545.png

 

Expected Output - Matrix with similarity % for each finished good combination:

 

Alterisk_1-1672899624787.png

 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

My approach for the solution is as follows:

1. Build a single list of unique FGs with a concatenated list of corresponding components in each row:

 

Alterisk_2-1672899801229.png

 

2. Build a matrix with the input and the concat list and search for each component in the corresponding concatenated list for each FG combination. If found, enter 1 else enter 0:

 

Alterisk_3-1672899889713.png

 

3. Sum up the values for each combination (eg. FG1-FG1, FG1-FG2, FG1-FG3, and so on):

 

Alterisk_4-1672899942357.png

 

4. Build a new matrix which represents the count of distinct components for each FG combination (for e.g. FG1 has components 1 and 2, while FG2 has components 1,3,4. So the resulting FG combination FG1-FG2 will have 4 distinct components (1,2,3,4):

 

Alterisk_5-1672900018050.png

 

5. Divide the matrix obtained in step 3 by the matrix from step 4, thereby giving the similarity matrix

 

Alterisk_6-1672900152107.png

 

 

 

Please let me know if there is a way to implement this in Alteryx, or if there's a better way to get to this output. 
Thanks in advance for all your help! 

 

 

 

 

5 REPLIES 5
FinnCharlton
13 - Pulsar

Hi Alterisk,

Here is a possible solution. It basically follows your method: forming all possible combinations, calculating matches and distinct combinations, then calculating percentages.

FinnCharlton_0-1672916222420.png

FinnCharlton_1-1672916233195.png

Hope this solves your problem.

 

Alterisk
5 - Atom

Thank you @FinnCharlton, that was really helpful and does the job!
However, is there a way to optimize the workflow? I have an input dataset that has over 100k rows (~36k unique FGs, and ~24k unique components), so the Cartesian Join (and subsequent joins) would consume a tremendous amount of memory / storage. It would be helpful if you could also recommend an alternative to this.
Thanks again!

FinnCharlton
13 - Pulsar

Here is an optimised version. As the output is symmetrical (i.e. the combination of FG1/FG2 is the same as FG2/FG1) we only need to calculate a subset of combinations. We also don't need to calculate the combination of an FG with itself, as this will always be 100%. This new workflow only generates the combinations we need.

FinnCharlton_1-1673000191158.png

If you compare the output of this workflow to the one above, you will see that these values are the only important ones - everything else can be inferred. 

 

I think the nature of this analysis means it will always be computationally intensive, but this should help a bit.

 

 

Alterisk
5 - Atom

Thanks for the quick reply on this! The logic does make absolute sense, but unfortunately the FG and component names are strings and not necessarily sequential either (e.g. "FG-XYZ-0A2", "FG-2ABC",etc.), so numerical comparison in the loop will not work.
Is there still an alternative to this solution, by any chance? 

FinnCharlton
13 - Pulsar

I would solve this by assigning a numerical ID to each FG Name and usinng that for the workflow. If you need some help attach some sample data and I will show you what I mean.

Labels