First post in this forum, I will try my best.
I have a file that contains all government grant data for the year 2021. 5.5 million rows and around 300 columns. Three of the columns are important to me: period_of_performance_potential_start_date, period_of_performance_potential_end_date and sum_potential_total_value_of_award.
The start and end dates vary by contract and what i would like to do it to add columns for every month between the earliest date in the start date column and the latest date in the end date column. Then, for each contract, split the sum_potential_total_value evenly across the appropriate months in the date range for that contract.
How it is now | ||||||||||||
period_of_performance_potential_start_date | period_of_performance_potential_start_date | sum_potential_total_value_of_award | ||||||||||
Contract Row 1 | 3/1/2021 | 12/1/2021 | 250,000,000 | |||||||||
Contract Row 2 | 6/1/2021 | 11/1/2021 | 100,000,000 | |||||||||
What I want | ||||||||||||
3/1/2021 | 4/1/2021 | 5/1/2021 | 6/1/2021 | 7/1/2021 | 8/1/2021 | 9/1/2021 | 10/1/2021 | 11/1/2021 | 12/1/2021 | |||
Contract row 1 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | 25,000,000 | ||
Contract row 2 | 16,666,667 | 16,666,667 | 16,666,667 | 16,666,667 | 16,666,667 | 16,666,667 |
Thank you everyone!
Hi @sberhalter ,
I must be blind but i can't find either period_of_performance_potential_start_date AND sum_potential_total_value_of_award in the attached dataset?
Greetings,
Seb
My apologies:
potential_total_value_of_award
period_of_performance_potential_end_date
period_of_performance_start_date
@sberhalter
There are a lots of columns, if I am doing it correlty. 😂
Wow! Thank you 🙂
This works perfectly on the sample set but on the full set I run into issues with the cross-tab node and it adds a bunch of months that aren't actually contained in the start/end date columns. I have attached the full file so you can see what I mean.