Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Help make my data aggregation workflow more efficient

jt_edin
8 - Asteroid

I'm struggling to figure out how to create an efficient and intelligent module that will be easy to manage as my data becomes more complex. I've created an example here to show you what I mean. There must be a clever solution that will allow my workflow to scale up. I'll try to explain it as best I can...

 

  1. I'm building an interactive dashboard to allow users to get key information from my organisation's database. Users can cut the data two ways, by gender, and by income. Users make their selections via checkboxes, and the number returned is a count of how many records meet those criteria:
    Capture.PNG
  2. As you can see, there are 3 options in the first box (Gender) and 3 in the second box (Income). That means there are 9 possible outcomes, enumerated below:

    9choices.PNG
  3. The data I retrieve from my database platform looks like this. I get 4 rows, which don't include aggregations for 'Both' or 'All'. 

    supplied.PNG

  4. If I get 4 rows from the database, then I need Alteryx to calculate the 'missing' 5 rows. These are the ones that are missing:

    missing.PNG

  5. Here is my module to fill in these 'missing' rows. The module is also attached to this post.
    Module.PNG

And here is my question: This solution seems to work well, and although my example is much simpler than my real data, my solution works to automatically calculate the 'missing' rows required for my dashboard. However, what if I want to cut my data by 3, 4 or 5 variables? What if I want to cut the data by gender (x 3), income bands (x 10), age bands (x 10), country (x 50), etc?

 

I need to calculate a lot of rows here (which is fine, as my dashboard software is not smart enough to aggregate on the fly, but it does retrieve rows from a long table very quickly). The problem is that in its current layout, my module will have to grow massively for every extra variable. There must be a super smart way to design this process without having to create 10s or 100s of separate Summarize tools, all with slightly different settings. I wonder if there's a way to build a workflow or macro in which you can specify how to cut the data, and it does the rest. Does anyone have any ideas?

22 REPLIES 22
JohnJPS
15 - Aurora

I managed to get the grouping ability coded... memory is still a concern: with 8GB of RAM, the following ran successfully from R studio, but did crash in Alteryx... I think just barely, since it was counting rows and getting close the nearly 10 million that processed in R studio.

 

Anyway, this allows a grouping within R, so everything is done at once.

 

A macro might be worth consideration, since it's less going on in R during each iteration, but perhaps this will work for you if you have 16GB.

 

jt_edin
8 - Asteroid

Brilliant, this appears to work a charm. Thank you very much! Hopefully you won't hear any more from me....!

JohnJPS
15 - Aurora

LOL. No worries. I should note that this Q&A at StackOverflow is what finally got me over the hump. 

(The answer there was given by one of the developers of data.table in R.)

Labels