Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How do Alteryx generate duplicate rows?

crazybeauti_ful
8 - Asteroid

Hello,

 

For this problem, I am looking for an explanation more than an actual solution. I tried to google but somehow, I didn't get any answer.

 

Can anyone please explain how Alteryx generate duplicate rows? In this scenario, I am joining 3 files. First file has 15K rows, second file has 62K rows and the third file has 165K rows. I used 2 join tools using the same parameter - first tool to join first and second files; the second tool to join the output from the first join tool and third file. Overall, Alteryx generated 36M rows. Although I can use the unique tool to get unique items only, I still want to understand why I got 36M rows for an output.

 

If there is already an available reading material about this somewhere, appreciate if anyone can share that as well.

 

Thank you so much.

3 REPLIES 3
ggruccio
ACE Emeritus
ACE Emeritus

Hi @crazybeauti_ful ,

 

Given how you set up the join you may get multiple matches between files.  For example if File 1 has today's date in it 04/07/2020, and file 2 has 5 records in it with today's date, and you match on date you will get 5 records in the match.

 

This can also happen if you have a significant amount of null values that you are joining on between files.  If file 1 has 5 null values and file 2 has 10 null values you will get 50 records in a join on that field.

Blake
12 - Quasar

hey @crazybeauti_ful check out this knowledge base article on why you're duplicating records. 

 

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Why-Your-Join-Is-Getting-More-Recor...

 

You probably need to be joining on multiple fields rather than just one. 

 

Thanks, let me know if that answers your question. 

BrandonB
Alteryx
Alteryx

@crazybeauti_ful this is what is known as a "many to many" relationship. When you join on the columns in each table, it looks for every match of values between the two tables. 

 

ID    Fruit

1     Apple

2     Banana

3     Orange

2     Banana

 

ID    Quantity

1     23

2     43

3     76

1     34

 

Would give you

 

ID    Fruit        ID       Quantity

1     Apple       1         23

2     Banana    2        43

2     Banana    2        43

3     Orange    3        76

1     Apple       1        34

 

Notice that even though there is only one apple record in the first table, it matches the ID number 1 two times in the second table and therefore both rows will be created in the output. Also, the Banana row shows up twice as well because it exists twice in the first table. This goes both ways so if you have duplicates of the IDs in both of your tables, you will create records based on the matches found. 

Labels