This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have a workflow that is in need of optimization and I'm trying to think of smarter ways to use tools, starting with the most simple stuff that could be changed.
I have a large data set (370M records and growing every day) where there are a number of 'Group By's being used in the Summarize tool. It just occurred to me that if 1-2 of these levels are potentially removed (because I could get those levels of information another way outside of this flow), it might speed things up a bit.
Similarly I noticed a Unique tool in use in multiple places that I can probably turn into just 1 if it's done at the start rather than the end.
My guess is that both of these changes could improve performance (even 10 minutes shorter would be a welcome change!), but I was just curious if anyone knew specifically if these two tools can be drains on performance, if they're processing so many records.
The entire flow takes nearly 3 hours to run, and I'm not quite ready to test it yet so wanted to post here in the meantime in case anyone has any relevant/useful comments to share on this topic.
I think Performance Profiling would be a good place to start.
You select it in the Runtime tab (enable performance profiling). The results show up in the Results Window at the end of the workflow. You can go to the Messages in the Results Window and right-click to copy them out and store them.
Your thoughts on group by's and unique tools impacting performance is correct. These are called 'blocking tools'. This means that instead of records passing through one by one as they're processed, all of the data must be read into the tool before it moves on downstream. Other examples of tools like this are Joins, Sorts, append fields, auto fields, etc. I like to refer to the periodic table of Alteryx tools (link below) - the red outlines are tools that will behave like this. Strategically using them will certainly help your performance when dealing with millions of records.