How can I match 2 datasets without having duplicates in the output
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello!
I want to match 2 datasets with multiple fields for matching criteria, and there are several records with same matching criteria in both datasets. However, I don't want to have duplication of records in the output. (i.e., if the records has been matched with other records in other dataset, it shouldn't been used for matching with other records in other dataset)
For your ease of reference, I have attached the sample of data workflow.
I have tried to find the solutions from the existing discussion, but couldn't find any. Really appreciate if any of you could help me here! This issues has been appearing in several workflows that I am creating.
Thank you!
- Labels:
- Join
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@PhilipMannering thanks for your reply! That will do if I don't need details per records level. However, I need to match the records side by side (not grouped by). Do you have any other solutions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@123indo Can you provide the output you expect for the dummy data you provided in your example?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
if you want to single join the data sets then make them Unique. Then you can Union the duplicate and unjoined rows as you need them. I used my own test data and brought all rows back into the flow with the Union. Let me know if there are any additional requirements that need to be addressed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have put my expected output on the right hand side of my workflow. I hope it clarifies.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Yes, that will do for part of issues, but the output in the Duplicates (D) still need to be matched again.
For example, in my workflow earlier, I have add Unique before Join tool, but records bb100 in dataset 1 and dataset 2 which are the output in the Duplicates (D) are not yet matched:
I know I can add another Join tool for this. But the issues will arise if I don't know how many records can still be matched in the Duplicates (D) output. In the above example, there is only bb100 that is matched, but if I have thousands of rows data input, I wouldn't know how many more records that can still be matched - how many Unique and Join tools that I have to add.
I wonder if I can use other tool that are looping the formula to match, regardless the number of the records that is matched, without creating duplication of each dataset's records.