Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Simple join is blowing up, help?

Watermark
12 - Quasar
12 - Quasar

Anyone explain what's going on here? I have a file 24G, 31M lines joining to a file that's 15 Megs, 51k lines. You can see the stats on the join is blowing up

 

 

Join blow up.jpg

5 REPLIES 5
AngelosPachis
16 - Nebula

Hi @Watermark ,

 

That's a common issue with a join, if both L and R inputs have duplicate records. There are many posts in the community that address this 

 

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Join-returns-too-many-records/td-p/308...

 

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Why-My-Join-Is-Getting-More-Records-th...

 

The most common solution is either to stick a unique/summarize tool before your join or increase the number of fields you are joining on. If you also work with that many records, I will suggest exploring the Calgary tool palette. It indexes your data base and your workflow will run much faster.

 

Hope that helps

 

Angelos

Watermark
12 - Quasar
12 - Quasar

Angelos, 

 

It's a simple CSV connecting to a spreadsheet. It only has one field to join on, that's the URL. I'm going to go look at the to links you entered. 

Emil_Kos
17 - Castor
17 - Castor

Hi @Watermark,

 

It is also worth to mention that if you got empty or null columns they will also create thousands of duplicates. So it is worth to keep that in mind each time when you are performing join tool.

JBLove
10 - Fireball

@Watermark ,

 

Are you expecting there to be only one row per URL? 

 

If that's not the case then you may need to do some investigation in the data to understand what other data elements are causing the URLs to appear on multiple rows.  Perhaps a filter needs to be applied to the data or you can pare down the number of columns and follow @AngelosPachis suggestion on using the Summarize tool to remove duplicates.

Watermark
12 - Quasar
12 - Quasar

Yep, Enormous number of duplicates (not expected, lesson learned), as well as a hefty chunk of nulls (also not expected).  Thanks for the help. 

Labels
Top Solution Authors