Alteryx Designer Desktop Discussions

Wouterrrrrr · ‎10-23-2023

Hi,

I want to test my data, however it is not normally distributed and now I dont know which test to use. Within my data there are two groups which I want to compare. The variable measures the speed between two events (average = 7 days). This is because of some outliers which I need to filter out.

1) What is the best way to handle outliers (z-score?)
2) Which test can I use to compare both groups?

How you guys can help me

caltang · ‎10-23-2023

This is a question that involves some feature engineering and data sciene knowledge. Perhaps you can consider anything beyond -3 or +3 Z-Scores to be outliers, thereby reducing the number of observations in your analysis.

Your 2nd question is a bit vague. What are you testing? What are you testing between? Not very clear...

Also, it helps if you can provide your data or your workflow. Please provide relevant data to this use case, and kindly provide your criteria in as much detail as possible. If you have a workflow built halfway, kindly export that over as well.

To export a workflow go to: Options > Export Workflow. Kindly do NOT send a "Save As" copy.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/

Wouterrrrrr · ‎10-23-2023

Hi caltang,

Thank you for your response, I want to know which group is the quickest. However other test to determine the mean between two groups have the assumption that the data needs to be normnal distirbuted

caltang · ‎10-23-2023

Without the data, no one can give you good help. Can you kindly provide some data so that the community has something to work with...?

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/

Wouterrrrrr · ‎10-23-2023

Hope this works ;)

caltang · ‎10-23-2023

My apologies I just saw this. Based on your data, can you explain a bit more about “Days”? Is it TAT?

Having 0s will affect your standardisation.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/

Wouterrrrrr · ‎10-23-2023

Yes, the days are the time between two events. So 0 means the first and second event happened on the same day. This is important for me because this means the time between events is low (same day) and this is important. How longer the days, we suspect less result. That is what I want to test

BS_THE_ANALYST · ‎10-24-2023

@Wouterrrrrr this resource is fantastic if you're looking for predictive value: https://community.alteryx.com/t5/Data-Science/Alteryx-Predictive-Tools-Flowchart/ba-p/602881

if you're looking at which statistical test to apply, I think flowcharts are invaluable (source: https://onishlab.colostate.edu/summer-statistics-workshop-2019/which_test_flowchart/).

Also run your ideas through chatbotGPT/google. Assumptions etc.

All the best,

BS

All the best,
BS
LinkedIN

caltang · ‎10-24-2023

The right skew will exist even after you calculate your z-scores. The z-score doesn't inherently change the shape of the distribution.

I'm pretty stumped myself - you may have to try other methodologies to cater to your non-standardized data.

Perhaps you can try standardizing the days data this way:

But is your research question: Is there a difference in days between Closed Won and other stages as you expected?

H0: No difference between Closed Won and other stages

Ha: There is a difference between Closed Won and other stages

If you do reach standardization, this article is useful: https://www.thedataschool.co.uk/liu-zhang/test-for-the-difference-in-the-mean-t-test-in-alteryx/

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/

Alteryx Designer Desktop Discussions

Data Not Normally Distributed, which test?

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...