Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Why is my data getting so big in the middle of the workflow?

novalcia
7 - Meteor

This is my workflow:

 

Screenshot 2024-01-23 215322.png

My input tool at the top: 2.4 GB

My input tool at the bottom: 90kb

In the first Join tool: 4.5 GB

in the second Join tool: 350 GB >> alteryx crashed due to lack of temp drive space.

 

Why does this happen? How does the data become so big in the middle like that? How do I fix this? I can't upgrade my RAM and stuff because this is on a work laptop.

4 REPLIES 4
danilang
19 - Altair
19 - Altair

Hi @novalcia 

 

Things to look for

 

  • After your first join, you've created a copy of each of your top input top rows leading usage to double.  Your second join creates ~70 copies of the top input. 
  • Examine the configuration of your join tools. specifically the fields that you're joining on.  If you're not including every Key Field from the first table in the join, your row count will increase. 
  • If you have duplicates in the bottom input, these will cause multiple rows. 
  • If any of your joined fields have null values this can cause the row count to balloon since every null on the right will match with every null on the left

 

Dan  

Raj
16 - Nebula

@novalcia Please check the data Quality as rightly said by @danilang null or empty values unwantedly increase the size of data to huge no suppose you have 100 null in 1st input and 100 null in 2nd input this will generate 10000 records which will definitely be a issue.
also i suggest to add block until done after first join tool and try running it.

apathetichell
19 - Altair

drop un-needed extra fields. when joining on large files use autofields to lower string sizes. drop duplicate fields.

 

short is - we can't 100% tell you - for all we know one of these tools has a blob field which is huge - but those are the standard places I'd look.... also - don't be afraid of using In-DB tools if this data/some of this data is in a Database.

novalcia
7 - Meteor

This is my Join tools configuration:

 

This is my first Join tool, and I deselected the only right input.

Screenshot 2024-01-24 174045.png

 

This is the second join tool, and I deselected all right inputs, because they're duplicates and I only need the left ones.

 

Screenshot 2024-01-24 174404.png

 

I just learned that my data may be ballooning because of Cartesian Join? But how is that possible? I only joined the Acc Num (in join 1) and Identifier (in join 2), I didn't join multiple columns.Shouldn't it output only those that have the same identifier (for join 2)? And I know there are no nulls in my identifier column.Even if there are nulls in other columns, it shouldn't matter because I'm deselecting one side of the Join and I'm only Joining the identifier column anyway, right?

Labels
Top Solution Authors