Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Help make my data aggregation workflow more efficient

jt_edin
8 - Asteroid

I'm struggling to figure out how to create an efficient and intelligent module that will be easy to manage as my data becomes more complex. I've created an example here to show you what I mean. There must be a clever solution that will allow my workflow to scale up. I'll try to explain it as best I can...

 

  1. I'm building an interactive dashboard to allow users to get key information from my organisation's database. Users can cut the data two ways, by gender, and by income. Users make their selections via checkboxes, and the number returned is a count of how many records meet those criteria:
    Capture.PNG
  2. As you can see, there are 3 options in the first box (Gender) and 3 in the second box (Income). That means there are 9 possible outcomes, enumerated below:

    9choices.PNG
  3. The data I retrieve from my database platform looks like this. I get 4 rows, which don't include aggregations for 'Both' or 'All'. 

    supplied.PNG

  4. If I get 4 rows from the database, then I need Alteryx to calculate the 'missing' 5 rows. These are the ones that are missing:

    missing.PNG

  5. Here is my module to fill in these 'missing' rows. The module is also attached to this post.
    Module.PNG

And here is my question: This solution seems to work well, and although my example is much simpler than my real data, my solution works to automatically calculate the 'missing' rows required for my dashboard. However, what if I want to cut my data by 3, 4 or 5 variables? What if I want to cut the data by gender (x 3), income bands (x 10), age bands (x 10), country (x 50), etc?

 

I need to calculate a lot of rows here (which is fine, as my dashboard software is not smart enough to aggregate on the fly, but it does retrieve rows from a long table very quickly). The problem is that in its current layout, my module will have to grow massively for every extra variable. There must be a super smart way to design this process without having to create 10s or 100s of separate Summarize tools, all with slightly different settings. I wonder if there's a way to build a workflow or macro in which you can specify how to cut the data, and it does the rest. Does anyone have any ideas?

22 REPLIES 22
JohnJPS
15 - Aurora

Thinking outside the box; when you retrieve the data from the database, do you have something like:

SELECT Gender, Income, count=count(*)

FROM tables

GROUP BY Gender, Income

 

If so... could you add "WITH ROLLUP" to the end of your GROUP BY:

SELECT Gender, Income, count=count(*)

FROM tables

GROUP BY Gender, Income WITH ROLLUP

 

That should do the "rollups" for you... you may need to massage it for look and feel, but at least the rollups will already be done.  (Also, this is for SQL Server... syntax may differ for other data sources).

 

(PS: I'm going to add this as a product idea... seems like it should be a checkbox in the Summary tool: very useful functionality.)

JohnJPS
15 - Aurora

PS, it can also be done in R... actually this is more robust than "WITH ROLLUP" since it includes all possibilities.  I added another column to make it more evident.

 

jt_edin
8 - Asteroid

Thank you, that looks really promising. I will try to figure out how it works. Unfortunately I've not used macros before so I don't know how to bring the output back into Alteryx, but will try to figure it out. 

 

If anyone else tries it and gets the following error (there is no package called 'data.table')

 

R error.PNG

 

...then you need to visit this page http://community.alteryx.com/t5/Alteryx-Knowledge-Base/Using-Additional-R-Packages-Times-Series-Clus... and download the Install_R_Packages.yxwz tool, and run it with this instruction: data.table

 

Also, make sure you have permission to write to the relevant drive.

JohnJPS
15 - Aurora

Tip: in Main Menu > Options > User Settings > Edit User Settings... the "Macros" tab allows you to specify a directory in order to add any macro found in that directory to a tab on your toolbar. Then you can drag it into a workflow from the toolbar, again, like any other tool.

 

Or more generially, on the canvas in any workflow where you wish to use a macro, just right-click and select Insert > Macro... then browse out to where it's saved to add it to your workflow.

 

Once you see it in the canvas, you can right-click any macro and "Open Macro"" which will open it on a new canvas: good, quick way to investigate what it's doing.  Mine here is just some R code... if desired you could copy that out of the macro and use just the R tool in your workflow.  Enjoy!

jt_edin
8 - Asteroid

Thank you! I can't begin to understand how your R code works, but I'm getting the hang of testing it with larger input data and it works flawlessly. So concise, this will save me a lot of time and space. Thank you so much for your first class assistance!

JohnJPS
15 - Aurora

Just as an aside... the nested for loops in my code were bugging me, so I asked about it on StackOverflow... the answer simplified everything so much that I thought it worth mentioning here as well.  I was also able to add another control to the macro allowing you to grab whatever columns you want included grouping. The R code is just a few lines of code new and much easier to follow now as well. All the trickery is embedded in base R functions now. :-)

 

jt_edin
8 - Asteroid

Thanks again, that does look succinct!

 

Unfortunately I'm getting the following error:

 

R error.PNG

 

R (1) Error in eval(expr, envir, enclos) : object 'target' not found

 

Any idea how to fix it? Thanks

 

R code.PNG

JohnJPS
15 - Aurora

Hi @jt_edin,

 

Yep, running by itself from Designer gave me that too... it was working when plugged into a workflow, but I've now cleaned it up so that it will test properly by itself too. (attached).  Thanks for catching that!

 

 - John

jt_edin
8 - Asteroid

The latest version labels aggregations "Sum". I'm assuming this is R terminology as there doesn't seem to be an easy way to change this from "Sum" to, for example, "All".

 

Of course, I can do that rename easily in Alteryx, but what would happen if one of my input categories had a value called "Sum"? I guess I'd have to pre-filter it out. If there's an easy way to specify it in the code then I'm keen to learn. 

Thanks

Labels