Hello!
I am working with a dataset that contains duplicates it looks something like this:
Building ID | Field 1 | Field 2 |
1 | 1234 Maple St | 7:00pm 6-6-2022 |
2 | 123 Yellow St | 6:00pm 4-5-2022 |
1 | 123 Blue St | 5:33pm 2-5-2022 |
2 | 123 Yellow St | 3:00pm 6-6-2022 |
3 | 123 Green Ave | 3:00pm 4-5-2022 |
1 | 1234 Maple St | 7:10pm 6-6-2022 |
I am trying to remove duplicate based on multiple columns. Some users submit data with the same ID and the same address but it has been submitted at a later time. I am looking to remove the early time as I assume the more recent submission is their intended submission (ex: would want to remove the older 1234 yellow st). This becomes more complicated because some users submit data with the same building ID for different addresses (ex: building ID 1 has two different address but 3 different submissions). In this case it should end up with two submissions, removing the older duplicated address.
Does anyone have any suggestions on how I can clean this up?
Thanks
*Edit: it is okay for their to be duplicate building ID, it is not okay for their to be duplicate field 1
User | Count |
---|---|
59 | |
26 | |
24 | |
22 | |
21 |