Hi Folks,
So i have a big data set, more than 100 columns and over 6 million rows. I have a summarize to have the date grouped. this summarise takes about 1 hour. Any idea why ? or what can i do better?
Solved! Go to Solution.
Are you doing a group by on date, as well as applying functions to the remaining 99 columns? This can be very process intensive, so I suggest narrowing down your scope to only include the data that you need.
Do you have other processes running on your machine at the same time? This may be competing for resources. I suggest letting the process run on it's own. You can cache your workflow by right clicking at a place where you want to "freeze" the data, so the summarize only needs to run once.
If you're able to post your workflow as well as some sample data, happy to take a look at where else efficiencies can be gained.
Hope this helps!
Unfortuantely I cannot post the workflow and data sample coz of confidential data. But i was using summarize group by on dates, Rest are sum and group by on string and double data types. How can i cache this when the data in the workflow changes every day atleast a part of it
Hi @SouravKayal
Have you tried to see if you will have better performance using AMP engine? It will be good to also check its help doc (?). It's available on Designer 20.2 on
I am using 2019.4 in my Org
Can i Use cache and run where the input data changes everyday
i used the profiler tool and saw data cleansing taking 67% , i wanted to know how can i replace that. As in 2019.4 i dont have the sum function so using + to add at times it gives null if there is no value to a column. I have to replace all null with 0 to make sure the add happens.
The cleansing building block is a macro. What you are describing could also be achieved using the "Multi Field" formula to replace blank with 0. See below screenshot
 
					
				
				
			
		
