Alteryx Designer Desktop Discussions

novalcia · ‎01-23-2024

This is my workflow:

Screenshot 2024-01-23 215322.png

My input tool at the top: 2.4 GB

My input tool at the bottom: 90kb

In the first Join tool: 4.5 GB

in the second Join tool: 350 GB >> alteryx crashed due to lack of temp drive space.

Why does this happen? How does the data become so big in the middle like that? How do I fix this? I can't upgrade my RAM and stuff because this is on a work laptop.

danilang · ‎01-23-2024

Hi @novalcia

Things to look for

After your first join, you've created a copy of each of your top input top rows leading usage to double. Your second join creates ~70 copies of the top input.
Examine the configuration of your join tools. specifically the fields that you're joining on. If you're not including every Key Field from the first table in the join, your row count will increase.
If you have duplicates in the bottom input, these will cause multiple rows.
If any of your joined fields have null values this can cause the row count to balloon since every null on the right will match with every null on the left

Dan

Raj · ‎01-23-2024

@novalcia Please check the data Quality as rightly said by @danilang null or empty values unwantedly increase the size of data to huge no suppose you have 100 null in 1st input and 100 null in 2nd input this will generate 10000 records which will definitely be a issue.
also i suggest to add block until done after first join tool and try running it.

apathetichell · ‎01-23-2024

drop un-needed extra fields. when joining on large files use autofields to lower string sizes. drop duplicate fields.

short is - we can't 100% tell you - for all we know one of these tools has a blob field which is huge - but those are the standard places I'd look.... also - don't be afraid of using In-DB tools if this data/some of this data is in a Database.

novalcia · ‎01-23-2024

This is my Join tools configuration:

This is my first Join tool, and I deselected the only right input.

Screenshot 2024-01-24 174045.png

This is the second join tool, and I deselected all right inputs, because they're duplicates and I only need the left ones.

Screenshot 2024-01-24 174404.png

I just learned that my data may be ballooning because of Cartesian Join? But how is that possible? I only joined the Acc Num (in join 1) and Identifier (in join 2), I didn't join multiple columns.Shouldn't it output only those that have the same identifier (for join 2)? And I know there are no nulls in my identifier column.Even if there are nulls in other columns, it shouldn't matter because I'm deselecting one side of the Join and I'm only Joining the identifier column anyway, right?

Alteryx Designer Desktop Discussions

Why is my data getting so big in the middle of the workflow?

Añadir data range a workflow

Split Big Excel file into small chunks by ensuring...

Re: Adding Data to Your Workflow

Re: Onboarding 1: Get to Know Alteryx

Unable to write data to a google big query table

Re: Outlook 365 Input tool issue

Re: parsing text to date

Re: If formula to determined a value based on date...

Re: Running multiple alteryx workflows within alte...

Re: Selecting the columns coming after a specific ...