Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Creating Dummy Variable with large Number of Values

MAAbdullahAlMubarah
8 - Asteroid

Dear Alteryx Experts 

 

I want to use linear regression for prediction, but I have a problem with the predictor variables (X-axis) 3 of them are categorical and I have at least 60 values on each of them , If I use formula tool for creating dummy variable I believe it will take more than 2 or 3 days to do so , can anyone tell me how to solve this issue faster? 

Note (If macro needed please provide it with workflow example for dummy creation for 60 values). 

14 REPLIES 14
jdunkerley79
ACE Emeritus
ACE Emeritus

I suggest:

 

2019-01-02_10-10-08.jpg

 

First create a unique list - produced by the summarize tool

Add a record ID for grouping by

Use an append fields tool to make a cartesian join to the rows of all the variables needed (note you must allow all joins in the Append Fields tool)

A simple formula tool to create the 1/0 values

Finally a Cross Tab to put the variables back on a single row

 

Sample attached.

JoeS
Alteryx Alumni (Retired)

When you say create a dummy variable, do you mean, create a binary variable for each?

 

If so, the Linear Regression tool will do this already for you, withing the macro.

 

You could also use a transpose and crosstab tool, I have attached an example.

 

Workflow.png

 

EDIT: get distracted for a bit and then get beaten to it by James!

MAAbdullahAlMubarah
8 - Asteroid

@jdunkerley79 , @JoeS Thank you both for your help, I tried to impliment it on this shared File:

 

https://drive.google.com/open?id=1F_sNYLXrEzS6tAjhWkYCm87UdQcNo1ks

but I got an error  in the Append field ("Error: Append Fields (14): There were more than 16 records in the source") for @jdunkerley79 solution, and for @JoeS still I do not understand it (how to deal it with my case), The Fields I'm looking to make it dummy are:

 

Shop_id, date_block_num and right_item_category_id.

 

Thank you
 

BenMoss
ACE Emeritus
ACE Emeritus

@MAAbdullahAlMubarah change the option on the append field tools at the bottom to 'Allow All Appends'.

MAAbdullahAlMubarah
8 - Asteroid

@JoeS You are right "Linear Regression tool will do this already for you" 

when I start to implement the linear regression it said the matrix is too large (XX GB) cannot be handled, that's why I thought it needs to create a dummy variable but in this case what I should do to solve the issue of the large matrix?

JoeS
Alteryx Alumni (Retired)

@MAAbdullahAlMubarah Yeah, R is going to struggle with memory to create the matrix.

 

I have re-attached my workflow with your file in the bottom. It still takes a while to run (12 mins on my laptop), but should achieve what you are looking for.

jdunkerley79
ACE Emeritus
ACE Emeritus

Make sure Allow All Appends is set in the Append Fields tool:

2019-01-02_11-53-51.jpg

MAAbdullahAlMubarah
8 - Asteroid

@JoeS How to resolve the memory allocation problem? 

JoeS
Alteryx Alumni (Retired)

I am not 100% sure you can with the matrix in opensource R. I think you will need to use Microsoft R instead, but I am not completely sure.

Labels
Top Solution Authors