Dear Alteryx Experts
I want to use linear regression for prediction, but I have a problem with the predictor variables (X-axis) 3 of them are categorical and I have at least 60 values on each of them , If I use formula tool for creating dummy variable I believe it will take more than 2 or 3 days to do so , can anyone tell me how to solve this issue faster?
Note (If macro needed please provide it with workflow example for dummy creation for 60 values).
Solved! Go to Solution.
I suggest:
First create a unique list - produced by the summarize tool
Add a record ID for grouping by
Use an append fields tool to make a cartesian join to the rows of all the variables needed (note you must allow all joins in the Append Fields tool)
A simple formula tool to create the 1/0 values
Finally a Cross Tab to put the variables back on a single row
Sample attached.
When you say create a dummy variable, do you mean, create a binary variable for each?
If so, the Linear Regression tool will do this already for you, withing the macro.
You could also use a transpose and crosstab tool, I have attached an example.
EDIT: get distracted for a bit and then get beaten to it by James!
@jdunkerley79 , @JoeS Thank you both for your help, I tried to impliment it on this shared File:
https://drive.google.com/open?id=1F_sNYLXrEzS6tAjhWkYCm87UdQcNo1ks
but I got an error in the Append field ("Error: Append Fields (14): There were more than 16 records in the source") for @jdunkerley79 solution, and for @JoeS still I do not understand it (how to deal it with my case), The Fields I'm looking to make it dummy are:
Shop_id, date_block_num and right_item_category_id.
Thank you
@MAAbdullahAlMubarah change the option on the append field tools at the bottom to 'Allow All Appends'.
@JoeS You are right "Linear Regression tool will do this already for you"
when I start to implement the linear regression it said the matrix is too large (XX GB) cannot be handled, that's why I thought it needs to create a dummy variable but in this case what I should do to solve the issue of the large matrix?
@MAAbdullahAlMubarah Yeah, R is going to struggle with memory to create the matrix.
I have re-attached my workflow with your file in the bottom. It still takes a while to run (12 mins on my laptop), but should achieve what you are looking for.
Make sure Allow All Appends is set in the Append Fields tool:
@JoeS How to resolve the memory allocation problem?
I am not 100% sure you can with the matrix in opensource R. I think you will need to use Microsoft R instead, but I am not completely sure.
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
6 |