One of my users has a workflow with a relatively small amount of Excel files, but hundreds of tabs within each file (per below comment tool) for a total of about 6,600 tabs to read. Grand total, it's probably about 700,000 records, which generally processes in minutes for other use cases, but of course this configuration is much different from other ones I've seen.
At a small scale, the dynamic input runs fine after reading about 100 or so tabs at a rate about 1-2 sheets per second. But after that point, processing speed plummets to roughly 30 seconds per sheet. I'm fairly certain it has to do with Alteryx treating each tab as needing to close and reopen each file for every record, which kills the system. I'm calculating the workflow would take about 5-8 hours to run, which is not viable, since we need to target <20 minutes for my organization's Gallery policy.
Ordinarily, the workflow would run on a shared drive, but even through my local drive, it's still taking just as long.
I'm wondering if there's better logic I can use to perhaps trick Alteryx into understanding that we're dealing with a relatively small amount of data. Maybe a macro? I'm pretty good at designing batch macros, but I don't really have any good ideas that would address this issue. Changing the source files' format itself isn't an option, unfortunately. Thanks in advance for anyone's support.
I am not sure a Batch Macro will address the core issue of opening and closing each sheet. How many individual workbooks is this?
Instead you might consider using multiple dynamic input tools to run these in parrallel.
For example, maybe aim for about 500 to 1000 sheets per dynamic input instance. I am assuming you have at least 6 different workbooks.
Hi @JBLove ,
Thanks for the initial input. Per the screenshot, I have 16 workbooks. My workflow slows down to a crawl after 100 sheets, so shooting for 500-1000 per input tool may be unrealistic.
I don't necessarily need a macro as the solution-- I was just throwing out there as a possible approach. Maybe there's a different approach to this using a different mechanism, like the Python tool?
Ah I see the comment tool now showing all those useful details.
Perhaps to your original point the issue is in a bug with how the dynamic input tool works that 100 is when it starts to struggle.
You could try a Batch Macro with one file and see if that file can handle processing one workbook without the slow down. The approach I take with excel files looks to be exactly what you have done, just where you have the dynamic input tool I replace that with the Macro. Inside the Macro Your first tool will be an input tool that will be connected to a Control Parameter by way of an Action Tool. Basically you're going to pump from the outer workflow the entire file directory path\filename|||Sheet. Then if need be you can throw in a Select tool to standardize any data types and then use a Macro Output tool to send the data back to the workflow.
If that works then you need to figure out a way to list multiple workbooks and multiple sheets. You might check this post by @mceleavey that appears to handle this topic. I am not permitted to download files so I don't know if he actually used Batch macro or not. And perhaps this may even be an approach that solves your entire question.
Solved: Create a batch macro for Excel with multiple files... - Alteryx Community