Summarize Tool
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Folks,
So i have a big data set, more than 100 columns and over 6 million rows. I have a summarize to have the date grouped. this summarise takes about 1 hour. Any idea why ? or what can i do better?
Solved! Go to Solution.
- Labels:
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Are you doing a group by on date, as well as applying functions to the remaining 99 columns? This can be very process intensive, so I suggest narrowing down your scope to only include the data that you need.
Do you have other processes running on your machine at the same time? This may be competing for resources. I suggest letting the process run on it's own. You can cache your workflow by right clicking at a place where you want to "freeze" the data, so the summarize only needs to run once.
If you're able to post your workflow as well as some sample data, happy to take a look at where else efficiencies can be gained.
Hope this helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Unfortuantely I cannot post the workflow and data sample coz of confidential data. But i was using summarize group by on dates, Rest are sum and group by on string and double data types. How can i cache this when the data in the workflow changes every day atleast a part of it
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @SouravKayal
Have you tried to see if you will have better performance using AMP engine? It will be good to also check its help doc (?). It's available on Designer 20.2 on
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am using 2019.4 in my Org
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Can i Use cache and run where the input data changes everyday
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
i used the profiler tool and saw data cleansing taking 67% , i wanted to know how can i replace that. As in 2019.4 i dont have the sum function so using + to add at times it gives null if there is no value to a column. I have to replace all null with 0 to make sure the add happens.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The cleansing building block is a macro. What you are describing could also be achieved using the "Multi Field" formula to replace blank with 0. See below screenshot
