Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Data Not Normally Distributed, which test?

Wouterrrrrr
5 - Atom

Hi,

 

I want to test my data, however it is not normally distributed and now I dont know which test to use. Within my data there are two groups which I want to compare. The variable measures the speed between two events (average = 7 days). This is because of some outliers which I need to filter out.

1) What is the best way to handle outliers (z-score?)
2) Which test can I use to compare both groups?

 

How you guys can help me

8 REPLIES 8
caltang
17 - Castor
17 - Castor

This is a question that involves some feature engineering and data sciene knowledge. Perhaps you can consider anything beyond -3 or +3 Z-Scores to be outliers, thereby reducing the number of observations in your analysis. 

 

Your 2nd question is a bit vague. What are you testing? What are you testing between? Not very clear...

 

Also, it helps if you can provide your data or your workflow. Please provide relevant data to this use case, and kindly provide your criteria in as much detail as possible. If you have a workflow built halfway, kindly export that over as well. 

 

To export a workflow go to: Options > Export Workflow. Kindly do NOT send a "Save As" copy.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
Wouterrrrrr
5 - Atom

Hi caltang,

 

Thank you for your response, I want to know which group is the quickest. However other test to determine the mean between two groups have the assumption that the data needs to be normnal distirbuted

 

caltang
17 - Castor
17 - Castor

Without the data, no one can give you good help. Can you kindly provide some data so that the community has something to work with...?

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
Wouterrrrrr
5 - Atom

Hope this works ;) 

caltang
17 - Castor
17 - Castor

My apologies I just saw this. Based on your data, can you explain a bit more about “Days”? Is it TAT? 

Having 0s will affect your standardisation. 

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
Wouterrrrrr
5 - Atom

Yes, the days are the time between two events. So 0 means the first and second event happened on the same day. This is important for me because this means the time between events is low (same day) and this is important. How longer the days, we suspect less result. That is what I want to test

BS_THE_ANALYST
14 - Magnetar

@Wouterrrrrr this resource is fantastic if you're looking for predictive value: https://community.alteryx.com/t5/Data-Science/Alteryx-Predictive-Tools-Flowchart/ba-p/602881

 

if you're looking at which statistical test to apply, I think flowcharts are invaluable (source: https://onishlab.colostate.edu/summer-statistics-workshop-2019/which_test_flowchart/). 

Screenshot 2023-10-24 085158.png

 

Also run your ideas through chatbotGPT/google. Assumptions etc.

 

All the best, 

BS 

 

caltang
17 - Castor
17 - Castor

The right skew will exist even after you calculate your z-scores. The z-score doesn't inherently change the shape of the distribution.

 

I'm pretty stumped myself - you may have to try other methodologies to cater to your non-standardized data. 

 

Perhaps you can try standardizing the days data this way:

image.png

 

But is your research question: Is there a difference in days between Closed Won and other stages as you expected?

 

H0: No difference between Closed Won and other stages

Ha: There is a difference between Closed Won and other stages

 

If you do reach standardization, this article is useful: https://www.thedataschool.co.uk/liu-zhang/test-for-the-difference-in-the-mean-t-test-in-alteryx/ 

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
Labels