After running the MB Affinity tool, I have a 6000x6000 matrix of products and their cosine similarity to each other (similar to a correlation matrix).
I'm trying to transform it into a less unwieldy form, that summarizes the top 10 most similar products and their scores for each product. The fields might look like:
[Product] - [#1 Most Similar Product Name] - [#1 Most Similar Product Score] - [#2 Most Similar Product Name] - ... - [#10 Most Similar Product Score]
...
N = 6,000
Has anybody dealt with something like this before? I'm stumped for solutions. Keeping in mind, I'm constrained to working with a 340MB CSV file.
Thank you,
Michael
Solved! Go to Solution.
Here's a module that might do the trick....
Cheers,
Mark
Thank you Mark! I think I was unclear in my outline of the problem.
The data looks like this:
Michael
It's looking much better, thank you.
The next challenge will be to take the top 10 [Name] and [Value] from each [RowNames].
Michael
Thank you Mark!
The remaining work was to use a multi-row formula grouped by the rowname to number products off 1 to 10, convert the numbering to a string and use a formula to create field names (e.g., "Similar Product 1" and "Similar Product Score 1"), then two parallel cross-tabs with the generated field names as headers and the variable name/score as values. Join them together and that's it : )
Michael