I have complex large size data sets comprizng many months that I arrvied at following subset with one avg. column averaging all months.
Now I want to run by country cumulative % contribution of each material in data set. something like below, (column in bold font is what I wish to achieve)
all other calculated columns are done...
done with sorting and adding running totals with row id function added. Now want to achieve Country wise total and then add Cumulative % field added to it.. Any help appreciated. Since data set is large, avg. execution time is around 30 min... so any experiment require me to wait for 30 min for each attempt... is there a way to execute part of the flow only ?
Country | Material | Vol | Cum Total | Country Total | Cum % |
CH | 0006 | 125 | 125 | 199 | 63% |
CH | 0005 | 49 | 174 | 199 | 87% |
CH | 0004 | 25 | 199 | 199 | 100% |
CN | 0009 | 700 | 700 | 1400 | 50% |
CN | 0008 | 500 | 1200 | 1400 | 86% |
CN | 0007 | 200 | 1400 | 1400 | 100% |
US | 0003 | 109 | 109 | 248 | 44% |
US | 0002 | 89 | 198 | 248 | 80% |
US | 0001 | 50 | 248 | 248 | 100% |
Solved! Go to Solution.
Hi @PrathameshMDLZ ,
Here is a workflow for the task.
Input:
Output:
Workflow:
Hope this helps 🙂
If this post helps you please mark it as solution. And give a like if you dont mind 😀👍
For a smaller data set, use the Sample tool to sample out a handful of records for testing. 100, 200, whatever works for you
A Summarize tool will get you the maximum Cum Total for each Country. Group on Country, use Max for the
Cum Total, and create the column for Country Total. Join that back to your original data on Country field.
You can then use a formula tool to calculate percentage of Cumulative total.
Let me know if that helps.
Cheers!
Esther
You can use the cache and run(C&R) workflow option near the end of your workflow to help you test out options. C&R runs the workflow up to that point and saves the results to a temporary file. Any time you run from then on, the input is read from the cache reducing the run time from 30 mins to seconds.
To use C&R, right click on a tool near the end of your workflow and select the "Cache and Run Workflow" option.
There are a few conditions the the tool has to meet to have the C&R option enabled
1. The tool can only have 1 output anchor
2. The tool can't be inside a split stream. A split stream is when a data flow is split in two and the two parts are rejoined further on. The tools inside the green box above are inside a split stream that is started by the Join at the upper left and ended at the Union tool where #1 and #2 streams come together. In this case, the C&R option on the select highlighted in blue is grayed out.
Another thing to note is that the cached data is only available while the workflow is open in Designer. If you close either the workflow or Designer itself, you'll have to cache the data again the nest time you open the workflow
Dan
This worked perfectly.... thanks for quick and simple solve... 🙂 millions of records so experimentation was difficult....