Hi everyone,
I have a workflow I use to validate if data was entered correctly into our system. In my workflow I am joining the file with the data to be entered to the file that contains an extract of the data entered into our system. In my join output I am seeing duplicates of certain items. I do not want to use a Unique tool because some items are intentionally duplicate. Can someone please assist in figuring out how to eliminate the unwanted duplicates?
This is my current output:
This would be my desired output
Rows 2 and 3 of the data are identical. You are getting the extra records due to a 'cross join', essentially the 2 duplicate Keys map to each other, creating 5 in the output (1 record for the unique key, 4 records for the 2 duplicate keys). The solution is to find a secondary field to join on, but there doesn't appear to be another field you can use to join on that would resolve this since the records are total duplicates. The question becomes do you expect these duplicate records in your data?
The question is how do you know what is intentionally duplicated. If you have this answer we can do some logic to segregate, however if it is simply something that another system does, without any support it is difficult.