Hi, has anyone seen this type of error before?
I am trying to identify the best cluster # by K-Centroid Diagnostics tool. After getting the data ready, I found that this error message kept showing up after clicking running.
I would really appreciate some advice.
Hi @SophinL this isn't an error that I have bumped into before but I have since done some investigation and have been able to replicate the issue.
It seems likely that the cause of this issue is a lack of variance between your data points, and the max number of cluster groups should be adjusted to a lower value.
For instance, I created an example where I passed in 999 records, 333 of them had the values (for three columns) 1, 2 and 3, another 333 of them had the values 4,5 and 6, and then the last 333 had the records 7,8 and 9. Clearly, for clustering, there are three groups that should be generated, and that's actually the most that can as the data can't be distinguished into further groups.
If I set the max number of clusters to 4, I get this error, however if I look for groups between 2 and 3, it works.
So, may be worth looking into your data and reducing the max number of clusters that you are looking to diagnose, if the number is already low, then it suggests the complexity of the problem is quite small.
Ben