Alteryx Designer Desktop Discussions

enylen · ‎12-18-2019

Hi all Alteryx users,

My intention is to create a flow in order to fill an Excel model with numbers used to calculate the total time it takes to perform a set of tasks depending on among other things the industry the one using the model belongs to. In order to do so, I have collected data for the tasks. (The attached file shows one task).

I want to use the average time (per industry) it takes to perform the task, if I can say that it differs in time between industries, and the average time it takes (no matter industry) if I cannot say that the task differ depending on industry given the current data in the set. After Googling around for a bit I intend to use the tool "Set of Means to do so".

Questions:

1) The tool does not work. Is it because one of the group identifier data sets only contain 1 value? That can be adjusted for.

2) How do I do this comparison the best way?

I would like to compare if the industries differ, but from my understanding I need a control group which everything will be compared to?
Can I use the average (no matter the industry) as the control group? In that case I would compare if one can say that the industry averages is different from the average (for all industries), rather than if the industries differ among themselves. Does it matter in practice?

The flow will be updated with data over time, so I would like it to be as automatic as possible. I will also have to do this for a lot of tasks, so keeping a clean flow would be nice.

Best regards,

Erik

RolandSchubert · ‎12-18-2019

Hi @enylen ,

you are right, the group containing only one value causes the problem (at least two values for a group are expected).

The Test of Means tool compares either two groups or each group against a control group. I think you have basically two options:

- compare against an "average for all industries" - this would make sense in my opinion, if the shares of the industries differ not extremely (e.g. A 10%, B 5%, C 85% of rows)

- compare each industry against each other by running a series of Test of Means tools selecting one industry as a control group and all others as treatment group (maybe using a macro) - this will be a bit more complicated

Hope this helps in any way.

Best,

Roland

enylen · ‎12-19-2019

Great,

Thanks for the reply! I myself, managed to erase my response.

Sometimes the share of the industries differ extremely due to lack of data. This due to randomness. The population does not differ that much though. No industry is however dominant compared to the others in the sample. (Sample: Industry A 10%, Industry B 10%, Industry C 10%, Industry D 3 %, Industry E 1%, industry F 20% etc.).

I would like to avoid making a series of Test of Means if possible since it would create a spider web of flows. This is because I have to do this for a lot of tasks and since not all tasks will be sorted by industries but on other things.

What do you think?

RolandSchubert · ‎12-19-2019

I think, if there is no dominant industry, comparing against an average should work. As I understood your intention, you want to analyze the influence of industry on time needed for set of tasks. Using a comparison to average will result in "above average" or "below average" to a certain degree, so you get some kind of order (high - low industry specific effort) and distance to average, which should be sufficient to valuate expected effort of task (I assume you are doing something like precalcution).

Alteryx Designer Desktop Discussions

Errors and questions regarding the Test of Means tool