Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Test for same distribution

jcalvo92
7 - Meteor

Hi, newbie here

 

Is there an easy way with Alteryx to check if two numeric variables follow the same distribution?

 

Thanks,

Javi

4 REPLIES 4
SydneyF
Alteryx Alumni (Retired)

Hi @jcalvo92,

 

Have you looked into the Distribution Analysis tool at all? It wouldn't directly compare your two variables, but you could test distributions for each variable to determine if the distribution each variable fits most closely is the same. Another option would be to visibly compare your variables using the Histogram tool or the Violin Plot tool. If you're interested in seeing if the variables are correlated with one another, check out the Association Analysis tool.

 

Note that all of these tools are a part of the Data Investigation tool category and require the predictive tools to be installed. 

 

Hope this helps! Please let me know if you have any further questions I might be able to assist you with!

 

Sydney

jcalvo92
7 - Meteor

Hi Sydney, thank you very much for your help!

 

The Distribution Analysis tool would work only for the 4 distributions provided there. And the Histogram or the Violin Plot can give me a 'feeling' of wether the variables' distribution / shape may be similar. I am looking for some statistic test to create an official procedure, something like 'run this tool and if the result is bigger than X then you may consider the two variables follow the same distribution'. 

 

The Association Analysis is the tool I am currently using to solve the problem, but it is not really what I am trying to do, and I think it is wrong in this case. The three tests provided there (Pearson, Spearman & Hoeffdings) try to determine if the two variables are or not independent, from one single sample of the two variables. My goal here is to determine if two different samples, that may be of different sizes, follow the same distribution. 

 

A well-known test that achieve this is the Kolmogorov-Smyrnov test, and you can get similar results using Q-Q plots. Is there a way to test this using Alteryx without resorting to R/Python coding?

 

Thanks

SydneyF
Alteryx Alumni (Retired)

Hi @jcalvo92,

 

There are not currently any native tools in Alteryx that run a Kolmogorov-Smirnov test or generates Q-Q plots for two provided variables. If this is something you would like to see added to the product, please create an idea here. Our product managers are active on this page and are always looking for great new additions for the product. Let me know if you do, and I can go ahead and star it (posts with more stars are more likely to be considered for the product). 

 

In the meantime, I know you said you wanted to avoid R and Python coding, but there is an R function included in the stats package that runs a Kolmogorov-Smirnov test. Using this in Alteryx takes three lines of code and doesn't require you to install any new dependencies:

 

 

 

# Read Data In
data.in <- read.Alteryx("#1")

# Run KS test on first and second column in input dataframe
test <- ks.test(data.in[,1], data.in[,2])

# Write out statistic and p-value to the first anchor
write.Alteryx(as.data.frame(c(test[1], test[2])), 1)

 

 

 

I've attached a workflow with this code written in an R tool.

 

Modifying the code to accept data from two different streams (accepting samples of different sizes) would be pretty straight forward:

 

 

 

#Read Data In
data.in1 <- read.Alteryx("#1")
data.in2 <- read.Alteryx("#2")

#Run KS test
test <- ks.test(data.in1[,1], data.in2[,1])

#Write out statistic and p-value
write.Alteryx(as.data.frame(c(test[1], test[2])), 1)

 

 

 

I hope this helps! Please let me know if there is anything else I can do to try to support you with this matter.

jcalvo92
7 - Meteor

Thank you very much again, @SydneyF 

 

It seems that I need to have a deeper look at the R tool, but the code seems really straight forward and your example is really good

 

I will let you know if I post a product idea, first I want to know what is the community asking for. There are countless statistical tests out there after all.

 

Thanks!

Labels