This is my workflow:
My input tool at the top: 2.4 GB
My input tool at the bottom: 90kb
In the first Join tool: 4.5 GB
in the second Join tool: 350 GB >> alteryx crashed due to lack of temp drive space.
Why does this happen? How does the data become so big in the middle like that? How do I fix this? I can't upgrade my RAM and stuff because this is on a work laptop.
Hi @novalcia
Things to look for
Dan
@novalcia Please check the data Quality as rightly said by @danilang null or empty values unwantedly increase the size of data to huge no suppose you have 100 null in 1st input and 100 null in 2nd input this will generate 10000 records which will definitely be a issue.
also i suggest to add block until done after first join tool and try running it.
drop un-needed extra fields. when joining on large files use autofields to lower string sizes. drop duplicate fields.
short is - we can't 100% tell you - for all we know one of these tools has a blob field which is huge - but those are the standard places I'd look.... also - don't be afraid of using In-DB tools if this data/some of this data is in a Database.
This is my Join tools configuration:
This is my first Join tool, and I deselected the only right input.
This is the second join tool, and I deselected all right inputs, because they're duplicates and I only need the left ones.
I just learned that my data may be ballooning because of Cartesian Join? But how is that possible? I only joined the Acc Num (in join 1) and Identifier (in join 2), I didn't join multiple columns.Shouldn't it output only those that have the same identifier (for join 2)? And I know there are no nulls in my identifier column.Even if there are nulls in other columns, it shouldn't matter because I'm deselecting one side of the Join and I'm only Joining the identifier column anyway, right?

