Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Data Aggregation

rtusco
7 - Meteor

Hi alteryx community,

 

I am looking for some help with data aggregation and I wasn't able to find the answer searching online. I've created a mock up input and output data set below to go with the narrative of what I am trying to achieve.

 

Input:

 

GroupProduct CodeGroup Linkage NumberName of ProductValue
AA20AA20_10Main Game1000
AA20AA20_21DLC200
AA20AA20_31Expansion600
AA20AA20_43DLC50
BC30BC30_10Main Game1600
BC30BC30_21Expansion700
BC30BC30_31DLC250
BC30BC30_43MOD350

 

As you can see in the above input we are looking at "video games". These are grouped using the "Group" column, which identifies a certain title. All formulae work should be done within each individual group.

 

A single group can contain multiple entries, which are further identified in the "Product Code" column, which is simply the "Group" with an underscore and a digit, starting from 1 and counting upwards. Games can be "Main Game", "Expansion", "DLC", "MOD", etc. and they all have a certain "Value" associated with them.

 

Now, the titles do have a Parent-Child dependency between them. This is determined in the "Group Linkage Number" column. A "Main Game" product is the top parent, not linked to anything, so its linkage number is "0". For any other products, these will be linked to another product, and the "Linkage number" is the "underscore number" of the parent's "Product Code". So, looking at group "AA20", there's a DLC and Expansion, both linked to the "Main Game". Since the main game is product code "AA20_1" then the linked DLC and Expansion will have a linkage number of 1. As another example, DLC product code "AA20_4" has a linkage number 3, because its immediate parent is the expansion product "AA20_3". I hope this makes sense.

 

I need to aggregate data so that it ONLY shows Main Games and Expansions with the value being a sum of all other child entries, summarised within the main parent. 

 

In the case of group AA20, that's easy enough. I would use a formula to substitute the Linkage Number into the Product Code and then perform a summarise on the Product Code.

 

In case of group "BC30" that's more problematic as there is a two step process. The MOD "BC30_4" has a linkage number 3, whereas the product "BC30_3" (the immediate parent) is itself a DLC, which is linked to the Main Game, "BC30_1". When I apply the above approach to this input, I get the following output:

 

GroupProduct CodeGroup Linkage NumberName of ProductValue
AA20AA20_10Main Game1200
AA20AA20_31Expansion650
BC30BC30_10Main Game1850
BC30BC30_21Expansion700
BC30BC30_33MOD350

 

AA20 is all fine, but there is an issue in BC30. The DLC which used to be BC30_3 was correctly changed into BC30_1 and summarised into the main game, but the mod, which used to be BC30_4 has been renamed to BC30_3 and is now orphaned.

 

A logical step would be for the MOD to first be "summarised" into the DLC, which would then be summarised into the Main Game. I don't know how to do this, as there can potentially be many steps in this parent-child relationship, but we still need to aggregate all "secondary" products (MODs, DLCs) and report them with their ultimate parents, the Main Games and Expansions. I assume that multi-row formula would probably need to be involved, but I leave this problem to your bright minds.

 

Thank you very much and apologies for the wall of text.

8 REPLIES 8
MatthewO
Alteryx
Alteryx

@rtusco I believe the attached workflow will get you the desired the result. It leverages a Multi Row Formula to update the Group Linkage Number when a secondary parent/child relationship exists. I hope this is helpful!

rtusco
7 - Meteor

Hi MatthewO,

 

Thanks so much for your response. Your workflow has been useful, particularly the multi-row formula step. I was able to adapt my own workflow based on this to successfully aggregate nearly all of the data.

 

There is however a snag, in respect of the multi-row formula looking at Row-1, as this assumes that the child immediately succeeds its parent in the data - that is not always the case. To illustrate, I've prepared an updated input below, adding the LinkID column created by your example workflow:

 

GroupProduct CodeGroup Linkage NumberName of ProductValueLink ID
BC30BC30_10Main Game2001
BC30BC30_21Expansion1202
BC30BC30_31Expansion1003
BC30BC30_42DLC504
BC30BC30_54MOD305
BC30BC30_64MOD256
BC30BC30_75MOD137

 

When I run this data through the workflow it essentially changes to the following (focusing on the Group Linkage Number)

 

GroupProduct CodeGroup Linkage NumberName of ProductValueLink ID
BC30BC30_10Main Game2001
BC30BC30_21Expansion1202
BC30BC30_31Expansion1003
BC30BC30_42DLC504
BC30BC30_52MOD305
BC30BC30_64MOD256
BC30BC30_75MOD137

 

The Group Linkage Number of BC30_5 is the same as the LinkID of BC30_4, so it correctly switches its Group Linkage Number from 4 to 2.

However, BC30_6 is not linked to 5, but to 4 as well, so the Row-1 formula does not trigger and this record (as well as the subsequent one) stay with the original Group Linkage Number of 4 and 5 respectively. They should all have linkage numbers of 2 and get summarised with product BC30_2.

 

I think we are nearly there and the approach works, but we would need some formula that searches not just Row-1 but all previous rows (within the Group only) and change the Group Linkage Number when it finds a match within LinkID.

 

Any ideas?

MatthewO
Alteryx
Alteryx

@rtusco when searching recursively you will need to leverage a macro. I have attached a new version of the workflow that uses an Iterative Macro to identify and summarize the parent/child values. 

 

image.png

rtusco
7 - Meteor

Hi MatthewO,

 

I think this Macro is definitely the way to go, but it is need of one final amendment before it is ready to go. I've been trying to figure out how macros work and build one myself, however I am struggling and time is of the essence. I would therefore be much obliged if you could have another look at this, please.

 

In the real dataset, there are way more columns to "Group By" or select "First" and way more numerical columns to "Sum" than is shown in the mock input. As such I have a question and an ask.

 

Question: If the data incoming into the macro has more columns than has been configured in the macro, how does the macro deal with those? Does it ignore them (and work with the configured ones) or apply some default behaviour to any newly encountered/unfamiliar columns? Should the macro be designed with headers for all columns likely to be present in the full input?

 

Ask: Since there are a lot of columns to sum, I think rather than trying to achieve this all in the macro, this should be done outside of it. Could it be possible to have a macro that just focuses on establishing relationships between each entry and its ultimate parent? I would like a macro that searches recursively and updates the Product Code column (if need be we can create a new column) so that every Secondary item (MOD, DLC, etc.) has the same Product Code as its ultimate parent, be it main game or expansion. In other words, it would be enough if I could change this:

 

Product Code
BC30_1
BC30_2
BC30_3
BC30_4
BC30_5
BC30_6
BC30_7

  

..into this:

 

Product Code
BC30_1
BC30_2
BC30_3
BC30_2
BC30_2
BC30_2
BC30_2

 

I could then simply "Group by"  [Product Code] and then SUM (or do any other mathematics) for other columns outside the Macro. Is that at all possible?

 

I hope that makes sense and thank you for your time!

MatthewO
Alteryx
Alteryx

@rtusco the effectiveness of this macro will be heavily dependent on the data itself. You mentioned that you have many additional columns to group by. These would need to be configured in the macro logic. Adding additional fields would cause the current macro to error, or yield incorrect results. Additionally, it is predicated on the fact that the parent/child hierarchy IDs are truly sequential. For example, a child cannot be a lower id value than the parent. That said, I'm not sure this is a useful solution for you but likely a reference to work from.

rtusco
7 - Meteor

Hi MatthewO,

 

That did the trick! I was able to take the macro and after a few tweaks I got it to produce the expected aggregated set of data. So thank you very much for your help!

 

A parting question about macros: does the macro always have to travel as a separate file, next to the workflow (based on relative paths) or is there a way to integrate the macro as a "custom tool", so that if you share the workflow with someone else, that workflow will contain the macro within it?

MatthewO
Alteryx
Alteryx

@rtusco a macro will always exist as it's own file. There are two ways I will recommend for ensuring the macro exists when collaborating with other Designer users.

 

  • Export the Workflow: exporting a workflow allows you to include the macro files with the main workflow as a single package. When another user opens the packaged file, the macros will be included. When packaging a workflow the macros are included as a copy, they are not dynamic. In other words, if the macro is changed later, it would be necessary to repackage and re-share the file.

  • Save Macros to a Network Drive: this is often the best option if you plan to collaborate with other Designer users. Saving macros to a common network drive and using the absolute path will also ensure that other Designer users can access the macros when they are shared. When the workflow runs, it will reference the most current version of the macro, not a copy. Additionally, you can specify a path to macro path in the Designer User Settings so that the macros have their own tool palette.
rtusco
7 - Meteor

Thank you for all your help, MatthewO!

Labels