community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Null hypothesis test for Non-Normal distributions (can't use Test of Means tool!)

Atom

Hi all!

I'm new(ish) to Alteryx, so apologies if this is a simple question.

 

I run A/B/Multivariate tests for a company, so am often looking for tests to understand whether my treatment group has outperformed the control (baseline) group. The easy way to do this in Alteryx is to do a Test of Means. However, recently I realized that this tool uses the Welch's T-test which assumes normal distributions.

 

Often, my data is NOT normally distributed.

(For instance, if we send out a message to get people to register for a new product, Only 5-10% might register, and then 90% of those who register might buy 1 thing, versus 1% buys 10 things, etc. - i.e. very right-skewed. Now, I want to understand if my test increased revenue.)

 

SO I understand there are other tests/methods out there (e.g. Mann-Whitney U Test, Bootstrapping) that might be more fitting for non-normally distributed data, but I can't seem to find information about whether any such tests exist (or can be bundled together to exist) within Alteryx or in the gallery. Anyone with advice? 

 

Thanks in advance!

Alteryx Certified Partner

I recommend taking advantage of the R tool in Alteryx to do what you described.

 

- Bootstrapping can be performed using the following package: https://cran.r-project.org/web/packages/boot/index.html

- Support for the Wilcoxon-Mann-Whitney test is available in R without a package. Here's an article with a walkthrough in R:

https://www.r-bloggers.com/wilcoxon-mann-whitney-rank-sum-test-or-test-u/

 

Highlighted
Alteryx
Alteryx

Hey @charliel,

 

Currently, none of the standard tools in Alteryx are designed to run a Test of Means against non-normal data. As @CharlieS suggested, you could build this using the R Tool and the wilcox.test() function. Out of curiosity, have y'all tested the normality of your data to ensure that it is not normally distributed? (the Distribution Analysis tool can help confirm this).

 

Also, have you had any issues with the AB Testing tools? If so, can you briefly explain what the issue is, and we can try to help you work through them. :)

 

Thanks!

Hoss Carroll
Customer Support Engineer
Alteryx
Alteryx Certified Partner

There is a tool package within Alteryx for A/B testing.

 

BUT . . . . to build on what Charlie said, you can use an OverSample tool to populate more of the rare situations that you are interested in.  And then you can build out a difference in difference equation in Alteryx.

 

https://www.youtube.com/watch?v=RKZXKMcsXqg

 

 

This will run using the Linear Regression tool.  And it is much more forgiving with your skewed data.  You will need to do the multiplication of the variables before you run your linear regression.

This is the next video on Applied Econometrics: Difference-in-Differences. This video borrows extensively from Ted Miguel's lectures at UC Berkeley.
Meteor

Charlie(s),

 

The R tool is definitely the right way to go about this and I agree with using bootstrapping to get it done.

 

Unfortunately I cannot get the R tool to run without an error using the 'boot' library in R. This core code running in RStudio DOES NOT run in the R tool. I've commented the Alteryx-specific code that I'm using in the R tool:

 

# Begin R code

library(boot)

 

### ALTERYX-SPECIFIC ###

data <- read.Alteryx('#1', mode = 'data.frame')

 

 

samplemean <- function(x, d) {
     return(mean(x[d]))
}

result <- boot(data$Roll, samplemean, R = 1000)

 

### ALTERYX-SPECIFIC ###

write.Alteryx(result, 1)

# End R Code

 

The non-Alteryx-specific code runs perfectly in RStudio. In Alteryx I get an error every time: "Error: R (4): Error in the.column[[1]] : object of type 'closure' is not subsettable"

 

I don't think the 'boot' class object will write to output 1 unless I write out a specific part or cast the output to a data frame.

 

But why would Alteryx fail with a subsettable error when base R does not?

 

Labels