Be sure to review our Idea Submission Guidelines for more information!
Submission GuidelinesMany features are in the form of categorical variables. It would be amazing to have a set of tools for clustering and dimensionality reduction on categorical variables.
The current tool set in Alteryx is fantastic for working with continuous variables (k-centroids, KNN, PCA), but falls short when working with continuous variables.
There are some ways to do dimensionality reduction on categorical variables (Multiple Correspondence Analysis, PCA with gower's distance, etc.) and some ways to cluster categorical variables (k-modes, working with medoids instead of centroids--PAM, etc.).
Some key considerations on which algorithm to use are time complexity, validity of results, and whether the algorithm can work on variables that are only categorical, or both categorical and continuous.
- Michael Dyatchenko
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.