Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Support vector machine with and without dummy variable conversions

gakkos2323
7 - Meteor

Hello All,

 

I am not new to Alteryx but I m new to the SVM tool in alteryx. In a perfect world, you need to convert categorical variables into dummy variables (one-hot encoding). However, I see that some folks (the examples that I've watched) do not do that and just take categorical variables as "string". My understanding is that SVM tool in Alteryx takes care of "one hot encoding" internally, maybe? Or does it just make it like 1-Female, 2-Male?

 

Attached is the workflow for my model. The first workflow was run with no conversion. In the second workflow, I converted categorical variables into dummy variables then run SVM. If SVM in alteryx does the encoding process automatically, how come the results of those two workflows do not give me the same output (matrix). What am I missing here? Which approach is more appropriate in Alteryx platform?

 

I guess all I am trying to understand how SVM in alteryx works. Based on my previous experience, we should always make the one-hot encoding unless the data is ordinal. 

2 REPLIES 2
BenMoss
ACE Emeritus
ACE Emeritus

Hi!

 

I can’t give you an answer for sure but my belief is that the predictive tools will take your data as you give it, so I don’t believe that it encodes the data for you.

 

one way to check how the tool works itself is to open it up (right click and hit ‘open macro’), from there you may be able to look through the workflow, or the R-script itself to see if this is handled in any way.

 

Ben

danilang
19 - Altair
19 - Altair

hi @gakkos2323 

 

In some tools, such as the Linear regression tool, string variables will be automatically converted to the corresponding categorical variables using one-hot encoding.   Unfortunately, I haven't been able to track down a definitive list of the tools that do and those that don't.  Performing the encoding yourself is the best way to be sure.  To help with this, @MarqueeCrew has released a new macro CReW Generate Dummy Variables that does the heavy lifting for you  

 

Dan     

Labels