Hi all Alteryx users,
My intention is to create a flow in order to fill an Excel model with numbers used to calculate the total time it takes to perform a set of tasks depending on among other things the industry the one using the model belongs to. In order to do so, I have collected data for the tasks. (The attached file shows one task).
I want to use the average time (per industry) it takes to perform the task, if I can say that it differs in time between industries, and the average time it takes (no matter industry) if I cannot say that the task differ depending on industry given the current data in the set. After Googling around for a bit I intend to use the tool "Set of Means to do so".
Questions:
1) The tool does not work. Is it because one of the group identifier data sets only contain 1 value? That can be adjusted for.
2) How do I do this comparison the best way?
The flow will be updated with data over time, so I would like it to be as automatic as possible. I will also have to do this for a lot of tasks, so keeping a clean flow would be nice.
Best regards,
Erik
Solved! Go to Solution.
Hi @enylen ,
you are right, the group containing only one value causes the problem (at least two values for a group are expected).
The Test of Means tool compares either two groups or each group against a control group. I think you have basically two options:
- compare against an "average for all industries" - this would make sense in my opinion, if the shares of the industries differ not extremely (e.g. A 10%, B 5%, C 85% of rows)
- compare each industry against each other by running a series of Test of Means tools selecting one industry as a control group and all others as treatment group (maybe using a macro) - this will be a bit more complicated
Hope this helps in any way.
Best,
Roland
Great,
Thanks for the reply! I myself, managed to erase my response.
Sometimes the share of the industries differ extremely due to lack of data. This due to randomness. The population does not differ that much though. No industry is however dominant compared to the others in the sample. (Sample: Industry A 10%, Industry B 10%, Industry C 10%, Industry D 3 %, Industry E 1%, industry F 20% etc.).
I would like to avoid making a series of Test of Means if possible since it would create a spider web of flows. This is because I have to do this for a lot of tasks and since not all tasks will be sorted by industries but on other things.
What do you think?
I think, if there is no dominant industry, comparing against an average should work. As I understood your intention, you want to analyze the influence of industry on time needed for set of tasks. Using a comparison to average will result in "above average" or "below average" to a certain degree, so you get some kind of order (high - low industry specific effort) and distance to average, which should be sufficient to valuate expected effort of task (I assume you are doing something like precalcution).