Hello,
For this problem, I am looking for an explanation more than an actual solution. I tried to google but somehow, I didn't get any answer.
Can anyone please explain how Alteryx generate duplicate rows? In this scenario, I am joining 3 files. First file has 15K rows, second file has 62K rows and the third file has 165K rows. I used 2 join tools using the same parameter - first tool to join first and second files; the second tool to join the output from the first join tool and third file. Overall, Alteryx generated 36M rows. Although I can use the unique tool to get unique items only, I still want to understand why I got 36M rows for an output.
If there is already an available reading material about this somewhere, appreciate if anyone can share that as well.
Thank you so much.
Solved! Go to Solution.
Hi @crazybeauti_ful ,
Given how you set up the join you may get multiple matches between files. For example if File 1 has today's date in it 04/07/2020, and file 2 has 5 records in it with today's date, and you match on date you will get 5 records in the match.
This can also happen if you have a significant amount of null values that you are joining on between files. If file 1 has 5 null values and file 2 has 10 null values you will get 50 records in a join on that field.
hey @crazybeauti_ful check out this knowledge base article on why you're duplicating records.
You probably need to be joining on multiple fields rather than just one.
Thanks, let me know if that answers your question.
@crazybeauti_ful this is what is known as a "many to many" relationship. When you join on the columns in each table, it looks for every match of values between the two tables.
ID Fruit
1 Apple
2 Banana
3 Orange
2 Banana
ID Quantity
1 23
2 43
3 76
1 34
Would give you
ID Fruit ID Quantity
1 Apple 1 23
2 Banana 2 43
2 Banana 2 43
3 Orange 3 76
1 Apple 1 34
Notice that even though there is only one apple record in the first table, it matches the ID number 1 two times in the second table and therefore both rows will be created in the output. Also, the Banana row shows up twice as well because it exists twice in the first table. This goes both ways so if you have duplicates of the IDs in both of your tables, you will create records based on the matches found.
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
6 |