Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Clustering using Ward's Method

rashi_06
5 - Atom

Hi

 

I have a a dataset that shows me the number of hours worked everyday in a particular geography. For example - 

 

Geo1-12-192-12-193-12-194-12-19
US1081610
UK12241411
IN14101813
FR16202517

 

I would like to group these countries in such a way that on any given day, the total hours worked by all geographies in that group is almost the same - i.e ensure there is minimum variance between the total number of hours worked.

 

I read online that I can create clusters using Ward's method to ensure minimum variance but can't find it in Alteryx.

 

Thanks for the help!

3 REPLIES 3
RolandSchubert
15 - Aurora
15 - Aurora

Hi @rashi_06 ,

 

unfortunately, hierarchical clustering (including Ward's method) is not implemented as a tool in Alteryx. If you want to use this approach, you only have the option to use the Python or R tool (I think, the method is available in both, in Python within scikit-learn).

 

The "minimum variance" is used to create clusters within Ward's method (starting with clusters containing one element, new clusters are created by combining elements with the mimimum variance, until all elements build one cluster). That does not necessarily ensure optimal clustering, so I would recommend to try k-Means (which is available in Alteryx) before creating something using R/Python.

 

Hope this is helpful in any way.

 

Best,

 

Roland

rashi_06
5 - Atom

Hi Roland

 

I did try doing it with K-means but the variance within each group is still quite high. I have daily data for 2 months and it will keep getting updated. 

 

Is there any other way I should look at my data to reduce variance? 

RolandSchubert
15 - Aurora
15 - Aurora

Did you already try to increase the number of clusters? Changing the clustering method (K-Means, K-Median, Neural Gas) and the number of starting seeds also could improve clustering. Question: Is "number of hours worked" the only variable you are using to create clusters? Maybe you could use e.g. the "time" information (e.g. day of week, weekend) in addition

Labels