Simple join is blowing up, help?

Anyone explain what's going on here? I have a file 24G, 31M lines joining to a file that's 15 Megs, 51k lines. You can see the stats on the join is blowing up

Join blow up.jpg

Join

Accepted answers

AngelosPachis

Hi @Watermark ,

That's a common issue with a join, if both L and R inputs have duplicate records. There are many posts in the community that address this

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Join-returns-too-many-records/td-p/308215

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Why-My-Join-Is-Getting-More-Records-than-Expected/td-p/531159

The most common solution is either to stick a unique/summarize tool before your join or increase the number of fields you are joining on. If you also work with that many records, I will suggest exploring the Calgary tool palette. It indexes your data base and your workflow will run much faster.

Hope that helps

Angelos

All comments

AngelosPachis

Hi @Watermark ,

That's a common issue with a join, if both L and R inputs have duplicate records. There are many posts in the community that address this

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Join-returns-too-many-records/td-p/308215

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Why-My-Join-Is-Getting-More-Records-than-Expected/td-p/531159

Hope that helps

Angelos

Watermark

Angelos,

It's a simple CSV connecting to a spreadsheet. It only has one field to join on, that's the URL. I'm going to go look at the to links you entered.

Emil_Kos

Hi @Watermark,

It is also worth to mention that if you got empty or null columns they will also create thousands of duplicates. So it is worth to keep that in mind each time when you are performing join tool.

Watermark

Yep, Enormous number of duplicates (not expected, lesson learned), as well as a hefty chunk of nulls (also not expected). Thanks for the help.

Quick Links

This months top contributors

atcodedog05 19458

Qiu 15865

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703