First post in this forum, I will try my best.
I have a file that contains all government grant data for the year 2021. 5.5 million rows and around 300 columns. Three of the columns are important to me: period_of_performance_potential_start_date, period_of_performance_potential_end_date and sum_potential_total_value_of_award.
The start and end dates vary by contract and what i would like to do it to add columns for every month between the earliest date in the start date column and the latest date in the end date column. Then, for each contract, split the sum_potential_total_value evenly across the appropriate months in the date range for that contract.
| How it is now | | | | | | | | | | | |
| | | | period_of_performance_potential_start_date | period_of_performance_potential_start_date | sum_potential_total_value_of_award | | | | | | | |
| | Contract Row 1 | 3/1/2021 | 12/1/2021 | 250,000,000 | | | | | | | |
| | Contract Row 2 | 6/1/2021 | 11/1/2021 | 100,000,000 | | | | | | | |
| | | | | | | | | | | | | |
| What I want | | | | | | | | | | | |
| | | | 3/1/2021 | 4/1/2021 | 5/1/2021 | 6/1/2021 | 7/1/2021 | 8/1/2021 | 9/1/2021 | 10/1/2021 | 11/1/2021 | 12/1/2021 |
| | Contract row 1 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 |
| | Contract row 2 | | | | 16,666,667 | 16,666,667 | 16,666,667 | 16,666,667 | 16,666,667 | 16,666,667 | |
Thank you everyone!