Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to handle dataset beyond memory limits?

Joshman108
8 - Asteroid

I want to query a large enough number of records that will exceed our 50gig working memory capacity, process it (no aggregation) and upload to a sql server table.

 

I'm trying to mimic the chunksize functionality of pandas where chunksize number of records are read from a file and processed and then dropped from memory as the iteration moves on to the next chunk.

 

The throttle tool does not accomplish this.

I don't see that batch macros accomplish this since they union all records on the other side, which I want to avoid.

 

 

Any suggestions on this?

3 REPLIES 3
Jonathan-Sherman
15 - Aurora
15 - Aurora

Hi @Joshman108,

 

Batch macros only union all records on the other side if you're using a macro output tool in the macro. If you use an output tool inside the batch macro then each bunch of records you send through will be uploaded to the table. Therefore creating a group id to filter on in a batch macro would let you upload to the table in chunks.

 

Kind regards,
Jonathan

danilang
19 - Altair
19 - Altair

Hi @Joshman108 

 

Where is your data coming from?  Is there any way that you can use a query in the input to create batches?

 

Dan  

Joshman108
8 - Asteroid

I actually asked this question before I read about how alteryx handles memory here.

By default if you have a dataset that doesn't fit in memory, alteryx won't try to load it all in memory and uses temporary files instead.

 

I guess I was a little thrown off because the python tool doesn't seem to abide by this limit and I had seen massive memory usage out of there.

 

We're all good.

Labels