Too Much Data Hogging Your Machine? Keep Only What You Need!

Sometimes multiple conclusions can be drawn from the same data.  Ok, often multiple conclusions can be drawn from the same data.  This is especially the case with the Connection Progress that pops up between tools.  You may be a bit familiar with this already.  When you run a module, you may see something similar to the following:

114gb of data is being passed through my data stream!  Is this a lot?  Well, yes, but ultimately we have to remember that Alteryx processes everything in memory.  Knowing this, the information that we see above doesn't mean we have 114gb of data being written directly to disk (many PC's don't even have this much available).  Simply put, there is a ton of data there but if you do not have any type of output connected to the tool, it stays in memory.  If we were to connect say, a Browse Tool to the end of my XML Parse Tool shown above, the temp file written out by my Browse Tool would in fact be every bit of that 114gb.  Luckily, I don't really need the data written out at this point (I'm performing further analysis downstream), so I simply add a Select Tool just after this and de-select the field with the massive amount of data and just like magic, my module runs very fast and efficient.

This little bit of info can be both extremely valuable and scary at the same time.  The value is simply that it shows you the amount of data you are dealing with.  The scary part is that it can be assumed this is all being written out to disk during runtime.  We now know that as long as we're not attaching a Browse Tool to the data at this point, and we deselect the fields we do not need further downstream, we keep our module tidy and efficient!

Until next time,

- Chad
