Hello all,
I have a dataset, and I only want the subset of this data of where there are at least 10 entries of a certain identifier. I used a summarize tool to "group by" my identifier, and count the number of times each of those identifiers showed up. There is about 8,000 unique identifiers, and about 1,300 of these have more than 10 repeats. How can I continue working on JUST that 1,300 subset that has more than 10 repeats?
Thank you!
Hi @greenv1nes
If you post some sample data I could mock something up, but essentially once you have your group-by and count, filter that to >= 10, then join that back to the original data on the identifier. The J output of the join will have just the identifiers with a count greater than or equal to 10.
Hi @greenv1nes
one way is to use a join tool.
Left: connect to your original data
Right: connect to (original data grouped by identifier, count occurrences, filter to keep only ID where count >=10)
then then Join output will give you the requires subset.
Dawn
Attach a filter first to the summarize stream to filter out those with counts less than 10 and then join your original dataset ids to the ids from your summarize tool.
It worked! Thanks so much.
It worked! Thanks so much, Dawn.