Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Clustering a dataset which has a combination of Categorical and Continuous Features

CM100
7 - Meteor

Hi Everyone, 

 

Am working on a insurance data set that have got both Continuous feature like car mileage, driver age, etc and Categorical feature like gender, car brand etc. 

 

Have tried using K-mean model on continuous features excluding the categorical feature and using K-mode for categorical feature excluding continuous feature. Kudos to Brian for this excellent post https://www.thedataschool.co.uk/brian-scally/clustering-categorical-data-in-alteryx/

 

Was wondering if there is a way / workflow to address both types of feature together ? 

 

Cheers

Cedric

 

 

3 REPLIES 3
mceleavey
17 - Castor
17 - Castor

Hi @CM100 ,

 

Each model requires different treatments of categorical variables. I've attached a tool I developed which allows you to use one-hot encoding to create a binary grid of all categorical variables.

An example of this would be if you have a column with the country name/code. You would "binarize" the variables into the following format:

 

Germany      UK      France

     0                1            0

 

This means rather than multi-value categorical columns you would have boolean values.

 

I hope this helps.

 

M.



Bulien

CM100
7 - Meteor

Hi @mceleavey, 

 

Thanks for the tool, this will come in very handy 

However given that I have encoded the categorical to 1,0 binaries. 

Will this make K-mean model applicable ? 1,0 may not have a ordinal relationship ? 

 

Thanks in advance !

 

mceleavey
17 - Castor
17 - Castor

Hi @CM100 ,

 

Yes, that's correct, ordinal encoding is a different approach to one-hot, but given you have created a binary grid with one-hot, then you can run all categorical variables through the actual predictive model simultaneously with the numeric variables. I'm not sure if you would therefore need ordinal encoding. Try that first, if you then decide you need ordinal encoding you will need to manually create that as far as I'm aware.

 

M.



Bulien

Labels