Hello everyone.
I am having a little bit of trouble regarding clustering, I recently started to use Alteryx, so I am hoping that someone would have an answer to my problem :)
Regarding the data that I am trying to analyze (Excel file):
- Sheet #1 : locations with, for each location, latitudes indicated in column A and longitudes in column B
- Sheet #2 : other locations (different from sheet #1) with, for each locations, latitudes indicated in column A and longitudes in column B
Illustrative example of the Excel file format (with A, B, C, X, Y, Z that are numbers corresponding to the latitude and longitude of each location):
I was trying to perform 2 types on analyzes:
- The first one only on data included in Sheet #1
- The second one between data included in Sheet #1 and data included in Sheet #2
--------------------------------------------------
First analysis:
The purpose is to create clusters by merging locations within a [X]kms radius, with a maximum of locations merged of [Y].
I want to have the possibility to change the [X] and [Y], to be able to have multiple outcomes of the analysis.
I tried to use the "find nearest" tool, but the outputs do not seem to enable me to create clusters; example :
- Location A is at [X]kms from location B, so we might think that we can create a cluster in location A (by including location B in it)
- But... Location B is also at [X]kms from locations C and D, so the cluster would have been instead in location B (by including locations A, C and D in it)
And, with the "find nearest" tool, the outputs do not show that kind of specificities, so I was not able to create clusters by using it (but maybe there is another way to do it by using the "find nearest" tool, I have not find it)
I have, then, tried to use the K-Centroids Cluster Analysis (I do not know if it is the right tool to perform what I am trying to do), but it does not seem to work when I try to input my data, see example below where I can not select any fields:
I have tried by using "run", but it always show me an error.
Besides, as mentioned before, I do not know if it is the right tool to use, so if there is a better tool to use do not hesitate to share your thoughts on it.
Second analysis :
This analysis would lead to 2 different outputs:
- First output: create clusters by merging locations from Sheet #2 in locations from Sheet #1 within a [X]kms radius, with a maximum of locations merged of [Y], but without clustering the locations in Sheet #1
- Second output: create clusters by merging locations from Sheet #2 and locations from Sheet #1 within a [X]kms radius, with a maximum of locations merged of [Y]
To give an illustrative example: suppose that we have:
- Location A and location B within Sheet #1 that are within a [X]kms radius
- Location A within Sheet #1 that is located near to locations C and D from Sheet #2 (within a [X]kms radius also)
- Location B within Sheet #1 that is not located near to any locations from Sheet #2
Thus, the outputs would be:
- First output: cluster within location A by including in it locations C and D (thus, no clusters on the sheet #1 locations, location B would remain as-is)
- Second output: cluster within location A by including in it locations B, C and D
--------------------------------------------------
I do not know if my explanations are clear enough, so feel free to ask me further questions if needed.
From a new Alteryx user: thank you very much for your time.
Best regards.