I currently have a data set which has three classifications
- Draft
- Valid
- Invalid
As a Document goes from draft to valid to invalid a new data line is created and the old one remains in the data set. This is causing issues and I therefore only want to keep the latest data line.
What I need to do is group the Document IDs and only keep the most recent record, however I cannot work out logic that doesn't disrupt the rest of my data.
Below is a sample of data to help explain the issue:
| Document ID | Document Title | Status | Contact ID | Contact | Document Valid from | Document Valid to | Values |
| 4001 | Title A | Active | 123 | ABC | 31/08/2016 12:00 AM | 01/04/2020 12:00 AM | 500 |
| 4002 | Title B | Draft | 124 | ABD | 01/05/2019 12:00 AM | 01/06/2020 12:00 AM | 752 |
| 4003 | Tile C | Active | 125 | ABE | 06/07/2017 12:00 AM | 31/03/2023 12:00 AM | 425 |
| 4001 | Title A | Inactive | 123 | ABC | 02/03/2016 12:00 AM | 01/04/2021 12:00 AM | 500 |
| 4002 | Title B | Active | 124 | ABD | 02/04/2014 12:00 AM | 03/04/2020 12:00 AM | 752 |
| 4001 | Title A | Draft | 123 | ABC | 06/08/2018 12:00 AM | 06/08/2023 12:00 AM | 120 |
| 4004 | Title D | Draft | 126 | ABF | 01/09/2016 12:00 AM | 22/02/2022 12:00 AM | 681 |
The desired output would be:
| Document ID | Document Title | Status | Contact ID | Contact | Document Valid from | Document Valid to | Values |
| 4003 | Tile C | Active | 125 | ABE | 06/07/2017 12:00 AM | 31/03/2023 12:00 AM | 425 |
| 4001 | Title A | Inactive | 123 | ABC | 02/03/2016 12:00 AM | 01/04/2021 12:00 AM | 500 |
| 4002 | Title B | Active | 124 | ABD | 02/04/2014 12:00 AM | 03/04/2020 12:00 AM | 752 |
| 4004 | Title D | Draft | 126 | ABF | 01/09/2016 12:00 AM | 22/02/2022 12:00 AM | 681 |
Any help or guidance appreciated