I have a situation where I need to remove some duplicates from a dataset. For example, i need to remove 4 of the 8 records that have the same record ID. Is there a better way to do this than simply filtering to the record ID and using the sample tool? I'm trying to figure out the best way to do this. Obviously the unique tool won't work. Thanks in advance.
record id | count should be | count is |
1 | 2 | 4 |
2 | 4 | 8 |
3 | 4 | 8 |
4 | 8 | 16 |
Solved! Go to Solution.
Hmmm. Could you add a second level "RecordID2" (using a multi-row tool) to add a RecordID within each RecordID grouping... then join the desired count by original RecordID... then filter out any "RecordID2" that is greater than the desired count?
RecordID | RecordID2 | DesiredCount | Included when filtering for RecordID2 <= DesiredCount? |
1 | 1 | 2 | Yes |
1 | 2 | 2 | Yes |
1 | 3 | 2 | No |
1 | 4 | 2 | No |
2 | 1 | 4 | Yes |
2 | 2 | 4 | Yes |
2 | 3 | 4 | Yes |
2 | 4 | 4 | Yes |
2 | 5 | 4 | No |
2 | 6 | 4 | No |
2 | 7 | 4 | No |
2 | 8 | 4 | No |
How about an iterative macro? This will effectively divide the records returned by half, and keep the remainder. For example:
1 record, 1 record returned
2 records, 1 record returned
3 records, 2 records returned
4 records, 2 records returned
5 records, 3 records returned
6 records, 3 records returned
etc...
Is this the logic you are looking for, or is it something else?
Like @NicoleJohnson suggested, here is a workflow that joins the data to a table (to assign max rows per record id). It then counts rows and keeps only the needed dupes.
Cheers,
Mark
Tried out the sample and it works great, I altered the multirow formula to "IIF([RecordID]=[Row-1:RecordID], [Row-1:RowNum]+1, 1)" and all expected can be displayed.