Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Comparison table

CainR
5 - Atom

Hi Team

 

Currently I want to be able to do a table that has the same Rows and Headers, the ouput that is desired would look like this

 

 BobDerekJenniferSally
BobxStd DevStd DevStd Dev
DerekStd DevxStd DevStd Dev
JenniferStd DevStd DevxStd Dev
SallyStd DevStd DevStd Devx

 

Column Bob Row Derek would ideally show the standard deviation of those two names records. 

The data that I have as a rough example looks like this.

NameValues
Bob5
Bob6
Bob5
Bob7
Bob9
Derek12
Derek10
Derek8
Jennifer7
Jennifer7
Jennifer7
Jennifer5
Jennifer8
Sally20
Sally18
Sally21

 

Could someone please offer some assistance in figuring this out?

Thanks

 

4 REPLIES 4
KaneG
Alteryx Alumni (Retired)

Hi @CainR,

 

In order to do this, you would need to calculate the StdDev for each combination and so working backwards, you want to get to a table that has the following Columns:

  • [Name]
  • [Name2]
  • [StdDev]

so that you can eventually crosstab that in to your required format.

 

To Do that, you would append the dataset to itself and then remove the records where Name==Name2 (Note, this would be bad if the dataset had > 1000 records, in that case, you would employ a slightly different method). 

 

Image 001 - 20161031 - 171004.png

 

After that, you can make a Unique set of combinations as a base for your final dataset before grouping all the records in such a way that you can summarise to work out the StdDev and then Join that back on to the clean dataset.

 

Image 001 - 20161031 - 171028.png

 

 

I've attached a module to do this, however there may be an easier way if your dataset is more defined or the reasson is more defined.

 

Kane

CainR
5 - Atom

Hi Kane

 

Thanks for the help and quick response, the purpose of this is to do a comparison of averages, std deviations and coefficient of variation across all the possible combinations in the matrix, there will be a seperate matrix for each value. This highlights to my team areas of potential interest. The current workflow is great for the small scale however I should have specified that the dataset has >1000 records. Could you please demonstrate the alternative method you mentioned?

 

Cheers

Joe_Mako
12 - Quasar

Here is a route that takes advantage of combining the aggregations. The attached computes aggregrations to the Name granurality, appends to get all potential name pairs, then computes the combined average and standard deviation. I also added some steps to reshape the results into the output you requested.

 

For Combining Standard Deviations and Averages, see:
http://stats.stackexchange.com/questions/55999/is-it-possible-to-find-the-combined-standard-deviatio...

 

in the code below:
m=Average
n=Count
s=Standard Deviation

m=
(m1*n1+m2*n2)/(n1+n2)

s=
SQRT(((n1-1)*POW(s1,2)+(n2-1)*POW(s2,2)+n1*POW((m1-m),2)+n2*POW((m2-m),2))/(n1+n2-1))

For Combining Correlation Coefficients (did not do in the attached), see:
http://mathoverflow.net/questions/57908/combining-correlation-coefficients

 

This should work fine if you have a few thousand names (but the crosstab would be insane), and if you want to be even more efficient, just calculate one side of the diagonal by changing the filter from:

[Name1] != [Name2]

to

[Name1] > [Name2]

Combine Avg and StdDev.png

 

KaneG
Alteryx Alumni (Retired)

Well, there you go... the example from @Joe_Mako is pretty on the money.

 

I didn't have a solution in mind when I referred to a slightly different method as I just quickly built out what first came to mind, but more aggregation/processing would have to be moved to before the append and that's what Joe's solution does. I like that solution because it offers the best reduction of processing speed.

 

However if you wanted to do it without the formula tool, I've attached a version that does that as well as I wanted to see if my brain was on the right track as to what I thought could be done.

 

Kane

Labels