I want to query a large enough number of records that will exceed our 50gig working memory capacity, process it (no aggregation) and upload to a sql server table.
I'm trying to mimic the chunksize functionality of pandas where chunksize number of records are read from a file and processed and then dropped from memory as the iteration moves on to the next chunk.
The throttle tool does not accomplish this.
I don't see that batch macros accomplish this since they union all records on the other side, which I want to avoid.
Any suggestions on this?
Solved! Go to Solution.
Hi @Joshman108,
Batch macros only union all records on the other side if you're using a macro output tool in the macro. If you use an output tool inside the batch macro then each bunch of records you send through will be uploaded to the table. Therefore creating a group id to filter on in a batch macro would let you upload to the table in chunks.
Kind regards,
Jonathan
Hi @Joshman108
Where is your data coming from? Is there any way that you can use a query in the input to create batches?
Dan
I actually asked this question before I read about how alteryx handles memory here.
By default if you have a dataset that doesn't fit in memory, alteryx won't try to load it all in memory and uses temporary files instead.
I guess I was a little thrown off because the python tool doesn't seem to abide by this limit and I had seen massive memory usage out of there.
We're all good.