This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Find answers, ask questions, and share expertise about Alteryx Designer.
General Discussions has some can't miss conversations going on right now! From conversations about automation to sharing your favorite Alteryx memes, there's something for everyone. Make it part of your community routine!
I am running a linear regression. One of my independent variables were categorical, the rest is continuous
I had 6 standardized vehicle brands in that categorical variables. I transformed each brand into numerical values (1-Honda, 2-Toyota....). I changed the data types to v_string (for that categorical variable). Then, I ran the regression. Based on the output table, my understanding is that the regression took care of the dummy conversion automatically. Am I missing something here?
Are you using the Linear Regression tool or the Assisted Modeling?
It should make them for you, but you can always check this in the output. You should see a variable created for each make with coefficients.
I use a small sample of vehicle data with the manufacturer name as a string field named "MAKE NAME". In the model report (R output anchor from the Linear Regression tool) you can see that a variable was created for each value, meaning a dummy was made.
Ignore the model, this was just a dummy test.
If you want to make your own ahead of time, that's also a good option. I like to do this because it gives me the opportunity to analze outside of the Linear Regression tool. @MarqueeCrew recently released a macro to make this process super easy. Follow the link below to download the macro if you don't want to create the dummy values yourself:
According to this the replies to this post by Alteryx's own @SydneyF , string variables will be converted to the corresponding categorical variables using one-hot encoding in the Linear Regression tool. This conversion removes the need for you to perform the encoding yourself. The vehicle brand column will be automatically encoded to a binary column for each distinct value in the original column. Allowing the tool to perform the encoding directly also makes interpretation of the results easier, since the brand names used in the model are directly reported in the results table
Note that this applies specifically to the Linear Regression tool. For other predictive tools, you may need to create the dummy variables yourself.
Yes, I am using the linear regression tool. In fact, I just reran my workflow "with" (1-Honda, 2- Toyota) and "without" (Honda,Toyota) conversion and it seems that both give me the same results. I think, as you said, it takes care of the dummy variables