I'm struggling to figure out how to create an efficient and intelligent module that will be easy to manage as my data becomes more complex. I've created an example here to show you what I mean. There must be a clever solution that will allow my workflow to scale up. I'll try to explain it as best I can...
And here is my question: This solution seems to work well, and although my example is much simpler than my real data, my solution works to automatically calculate the 'missing' rows required for my dashboard. However, what if I want to cut my data by 3, 4 or 5 variables? What if I want to cut the data by gender (x 3), income bands (x 10), age bands (x 10), country (x 50), etc?
I need to calculate a lot of rows here (which is fine, as my dashboard software is not smart enough to aggregate on the fly, but it does retrieve rows from a long table very quickly). The problem is that in its current layout, my module will have to grow massively for every extra variable. There must be a super smart way to design this process without having to create 10s or 100s of separate Summarize tools, all with slightly different settings. I wonder if there's a way to build a workflow or macro in which you can specify how to cut the data, and it does the rest. Does anyone have any ideas?
Solved! Go to Solution.
I managed to get the grouping ability coded... memory is still a concern: with 8GB of RAM, the following ran successfully from R studio, but did crash in Alteryx... I think just barely, since it was counting rows and getting close the nearly 10 million that processed in R studio.
Anyway, this allows a grouping within R, so everything is done at once.
A macro might be worth consideration, since it's less going on in R during each iteration, but perhaps this will work for you if you have 16GB.
Brilliant, this appears to work a charm. Thank you very much! Hopefully you won't hear any more from me....!
LOL. No worries. I should note that this Q&A at StackOverflow is what finally got me over the hump.
(The answer there was given by one of the developers of data.table in R.)