This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have an Alteryx application that is checking for duplicates and similarities between two txt files and outputting to excel in multiple different sheets based on similarities/differences.
I am having an issue with the join tool at the end of my workflow. Incoming from the left i have 369 Records and from the right 364.
The output from the Join tool only has 7 records from the left, 2 records from the right, and 362 from the inner join.
Is there a way to have all records that are duplicate output from the inner join? I have to do further filtration to the inner join data on fields I did not join the 2 instances with, so I need all 700+ records.
I have attached screenshots below of the workflow.
Yes, the Unique tool will help you with that! You'll need to select which column you believe has duplicates. The "U" output will display all unique records while the "D" output will display all duplicates. That is where it sounds like you'll want to focus.
I've tried that, but no luck. The issue isn't identifying the unique/duplicate records right now, I am trying to figure out where the 300+ records went because I need to be able to work with those for my next steps. Does that make sense?
Yes, I think I understand you correctly. That sounds very odd that the # of records output from L, J, and R do not add up to the number of records input from L and R. Have you tried deleting the Join tool and trying again fresh? That tool should not be affecting the number of records, only rearranging them depending on the column joined. Can you share the workflow or a more clear screenshot showing the count of records per input/output?
Not really. Can you provide a mock up of what you're trying to achieve?
If you want to see the duplicates from the inner join, you need to determine what you mean by a duplicate. Is it where all fields are exactly the same? The key fields are the same?
You can start by using a sum tool and grouping by the fields you want to define as a duplicate if they are the same, including a count to show how many records are in that group. Then filtering out the records where that count =1. You're left with the "Duplicates".
@mceleavey I am joining them based on key columns (which does not include dates) which will output to an excel sheet. Next based on key columns, take the duplicates and on another sheet output which records have the same key columns but different dates.