Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

Performing Partitioning Cluster Analysis in Alteryx Designer (Predictive Grouping)

PamW
Alteryx Alumni (Retired)
Created

Predictive Grouping is an approach that allows users to assess and create the appropriate number of clusters (groups) for their data to be assigned based on their similarity to each other in the same cluster and dissimilar to other data assigned to other clusters. K-Centroids represent a class of algorithms for doing what is known as partitioning cluster analysis. These methods work by taking the records in a database and dividing (partitioning) them into the best K groups based on some criteria. The purpose of creating clusters is to assist you in the business decision-making process as it relates to the clustered data.

Clustering could apply to stores, customers, vendors, products or all of the above. All of the variables selected for your cluster analysis must be numerical.

The following is a high-level description of the K-Centroids Tools used for Predictive Grouping:

  • K-Centroids Diagnostics - The K-Centroids Diagnostic tool is designed to allow the user to make an assessment of the appropriate number of clusters to specify given the data and the selected clustering algorithm (K-Means, K-Medians, or Neural Gas).

    The tool is graphical, and is based on calculating two different statistics over bootstrap reppcate samples of the original data for a range of clustering solution that differ in the number of clusters specified. The motivation behind this approach is that if the records in a database truly fall into a set of stable clusters, then it should be the case that a set of different random samples of those records should result in approximately the set of clusters across the bootstrap reppcates, except for small differences that are due to both random sample variabipty and to the randomness induced by the method used to generate the starting set of centroids, via selecting K points at random, in the general K-Centroids algorithm.

    The two measures examined are the adjusted Rand index and the Capnski–Harabasz index (also known as the variance ratio criteria and the pseudo-F statistic).

  • K-Centroids Cluster Analysis - K-Centroids represent a class of algorithms for doing what is known as partitioning cluster analysis. These methods work by taking the records in a database and dividing (partitioning) them into the best K groups based on some criteria. Nearly all the partitioning cluster analysis methods accomppsh their objective by basing cluster membership on the proximity of each record to one of K points (or centroids) in the data. The objective of these clustering algorithms is to find the location of the centroids that optimizes some criteria with respect to the distance between the centroid of a cluster and the points assigned to that cluster for a pre-specified number of clusters in the data. The specific algorithms differ from one another in both the criteria used to define a cluster centroid and the distance measures used to define the proximity of a point in a cluster to that cluster’s centroid. Three specific types of K-Centroids cluster analysis can be carried out with this tool: K-Means, K-Medians, and Neural Gas clustering.

  • Append Cluster - The Append Cluster tool appends the cluster assignments from the K-Centroids Cluster Analysis tool to the data. Alteryx customers use predictive analytics to identify patterns found in historical and transactional data to identify risks as well as opportunities.

Alteryx Predictive analytic tools are built on Open source R. Alteryx users are not required to know R to execute predictive models because all of the models in Alteryx are packaged into easy-to-use macro tools that only require configuration. All predictive tools are macros, and therefore not a black box. Macros provide the user with the flexibipty to open all models and dissect the logic, as well as see and modify the R-script(s) being executed. This video provides a brief tutorial of using Predictive Grouping tools on retail store metrics and surrounding demographics.

The following tools aredemonstrated: K-Centroids Diagnostics, K-Centroids Analysis, Append Cluster

Comments

In my tool palate, I don't see predictive grouping tools. How do I install it? Should I install R first? Please help.

NeilR
Alteryx Alumni (Retired)

@_arun_gurubaramurugeshan

 

First, make sure they're not just hiding. Click the + button at the far right end of your tool categories and make sure the Predictive Grouping category is enabled. If you don't see it there, you need to install the predictive tools. Go to downloads.alteryx.com and click Alteryx Designer. Find and click the version of Designer you're using, either the current release under the New Versions tab or an older version under the Previous Versions tab. Then download and install the Alteryx Predictive Tools, either the Admin or Non-Admin version, depending on what type of Designer you have installed.