I have a dataset that runs daily through a workflow that gives me an output that we will call Output-BOM-TIMESTAMP. I need to check and compare data amongst both the files where the new timestamp file will be considered the latest.
1) I would like to compare both the files considering "Output-BOM-5-26-2022" to be the latest. We will have to remove any row or record that is not present in the new file considering P-B-S column to always be unique.
Output-BOM-5-26-2022
| PBS | Prefix | Base | Suffix | Cost |
| P1-B1-S1 | P1 | B1 | S1 | 150 |
| P2-B2-S2 | P2 | B2 | S2 | 250 |
| P3-B3-S3 | P3 | B3 | S3 | 350 |
| P4-B4-S4 | P4 | B4 | S4 | 550 |
| P6-B6-S6 | P6 | B6 | S6 | 650 |
Output-BOM-5-25-2022
| PBS | Prefix | Base | Suffix | Cost |
| P1-B1-S1 | P1 | B1 | S1 | 100 |
| P2-B2-S2 | P2 | B2 | S2 | 200 |
| P3-B3-S3 | P3 | B3 | S3 | 300 |
| P5-B5-S5 | P5 | B5 | S5 | 400 |
| P6-B6-S6 | P6 | B6 | S6 | 650 |
2) I would like to now process through the data and create two new columns in the Output file. "Action" and "Change". In Action, we have 3 categories. "Add", "Revised", "No Change" depending on the change in Cost values from one dataset to another.
In Change, I would like to calculate the difference in Cost from one dataset to another.
I am adding the expected output down below and if anyone could help me with this workflow. It would be great.
| PBS | Prefix | Base | Suffix | Action | Change |
| P1-B1-S1 | P1 | B1 | S1 | Revised | 50 |
| P2-B2-S2 | P2 | B2 | S2 | Revised | 50 |
| P3-B3-S3 | P3 | B3 | S3 | Revised | 50 |
| P4-B4-S4 | P4 | B4 | S4 | Add | 550 |
| P6-B6-S6 | P6 | B6 | S6 | No Change | 0 |