Hello all, I've been beating my head on this for a week and just can't get all the way there. What I am trying to do below is first determine if there are two IDs that match AND has a Source 1 AND 2, if so I need to grab ALL those ID's that Match. You can see below the example. The first two columns are the input, the two columns on the right are the desired output. The above scenario is shown below, and as you can see below, the two bottom ID's that match, but there are not a Source 1 &2, so those are not included on the output. Is there a way to determine a match on ID AND determine if there is a source 1&2, if there is , then grab ALL ID's that fit that criteria.
I've tried Joins/Unique tool/etc, just can't get there.
Solved! Go to Solution.
Hi @dberroth ,
Try using a summarize tool in your input data.
Configure the summarize tool so you:
Group By: [Source]
Group By : [ID]
Count : [ID]
Then after your summarize tool, you have to add a filter tool that will use the condition:
Count_ID <2
Whatever falls out of the T output anchor is the desired output you need.
Cheers,
Angelos
Hi @dberroth
Exactly what @AngelosPachis said, just want to specifically call to do a 'distinct' count to weed out instances where the ID comes multiple times with the same source
wow, thanks everyone. that was too easy, which I assumed it would be. Curious though, technically I have more columns of data for those records and they get dropped off through the summarize tool, do I just rejoin them together or do I need to add all those columns in the summarize tool so those columns of data don't get dropped off.
I think your solution is getting me what I need, the only issue is I need that Source to stay with the data out of the summarization, otherwise I don't know which source it is and can't join it back up to the data, any ideas?
If you Group both on [ID] and [Source] you should be able to see both fields in the output
unfortunately, going by this solution I'm not getting there. Based off the requirements, only one ID (0443-04-99-99) should be being dropped off. Based off the below I'll be losing other ID's I need.
In the data you provided above @dberroth , there is no such ID as 0443-04-99-99.
Can you please provide us with a sample dataset? It helps when working on a task to have some data to play with.
Thanks,
Angelos
Now that I think about it @dberroth , another approach would be to group on [ID] and get the max of [Source].
If an [ID] only has an ID of 1, then the maximum [Source] will be 1, whilst for IDs that have a source of 2 the maximum [Source] will be 2.
You need to exclude those that have a maximum [Source] of 1, so you can add a filter tool to say [Max_Source]>1 and out of the T output anchor of the filter tool you will get all IDs that have a source of 1 and 2.
You only need to join back to the original dataset, to keep the records that you need falling out of the J output anchor
EDIT : Here's the concept I described above. Seems to yield the correct results
Cheers,
Angelos