How do I do an only unique on 2 fields without using the only unique tool by crew?
Solved! Go to Solution.
Looking at the logic behind the Only Unique macro contains your answer. If you use a unique tool followed by another unique tool connected to the D anchor, join that back to the U anchor in the first unique tool and then only look at the L anchor from the join tool will give you your answer. You will want to make sure that the same selections are made in both unique tools and that the join is joining on the fields that are being uniqued on as well.
Hi @clarrock
Here's another way to do it.
Group by the unique fields and count the number of occurrences of each combination. Select the ones that only appear once and then join back to the original data.
Dan
In terms of performance, the double Unique method from the CReW macro outperforms the Summarize method by about 2:1.
The attached workflow generates 10M rows with some duplicates. The message tool is configured to output the time after the last record has been read from cache. After running the workflow once to cache the results, disable each of the containers in turn and run. On my computer, the double Unique algorithm executes in about 50% of the time that the Summarize one does, once you factor in the amount the of time it takes to read the data from cache.
Dan
@BrandonB This worked! But what would be the only uniques duplicate anchor for Join. Join L anchor is Unique. Is it R or J anchor Duplicates?
The L on the join will be the only uniques because they are all of the values that don’t have a duplicate at all in the D anchor of the unique tool. If you consider the way the unique tool operates, it passes the first unique value of each value it finds through the U and if it ever finds that value again it passes through the D. So the idea of using another unique tool after the D and the left side of the join tool is to isolate the values that don’t match anything from the duplicates list and therefore are completely unique.
If you notice the inner join and the union along with the duplicates anchor in the picture, it is looking at all of the duplicates along with any of the values from the U anchor which would be your “only duplicates”. This is because it not only has to consider the duplicates from the bottom but also extract the values that are also existing in the U anchor which is the first instance of each duplicate value in the data set.
Step 1: UNIQUE
step 2: join (same values as unique) the unique output to the duplicate output from the join tool
Left Output anchor are truly unique records
cheers,
mark