Alteryx Designer

Find answers, ask questions, and share expertise about Alteryx Designer.

How do the prediction tools generate dummy variables?

6 - Meteoroid

I have notice, that for example the linear regression tool can automatically transforms categorical to nummerical or a vector.

How is this precisely done ?




Where do I find the explanation?


So far I have converted them "by hand", e.g.

assume we have one categorical feature with n different values A_1,...,A_n

I assigned to A_i the i-th basis vector (0,...,0,1,0,...,0)


Toy example:


is transformed to



In the next step I would ask:

Where I can change in the tool the algorithm how this transformation is done, e.g. instead of the transformation above use some kind of binary encoding,

i.e. implement a bijective map from my Category to some vector space over the field with two elements.

Alteryx Certified Partner
Alteryx Certified Partner

If you're looking for a process to transform categorical fields into dummy variables for modeling, I have attached a solution I built a while back. Let me know if this works for you. 



6 - Meteoroid

Hello CharlieS!


I thank you for your workflow, but I have buildt them on my own. Nevertheless I will take a close look at your solution,

as one can allways learn something new. :-)


However; The question is still there :

how it is implemented in Alteryx, as it looks like certain tools do this to some extend automatically?


More precise: (although the following is simplyfied)

"By accicdent" I pluged in some categorical data into the linear regression tool and the output suggests that

the different entries in the categorical columns were identified and transformed.

To keep it simple: One column of categorical were handled quite well, but one column of categorical was "ignored".


As I do not know how this was done, I can only do some guesses (hence I created this thread).

One reason could be : The good column contains only a few number of different categoricals, where the bad column contains lots of different. Hence any statistic we apply is more resonable for the "good" column.