Hi,
I'm trying to transform a table, to create some features (new columns) for a statistic model.
My initial dataset is like this:
Year | dif_year | user_name | field1 | … | field30 |
2022 | 0 | bob | aaa | … | mmm |
2022 | 0 | bill | bbb | … | nnn |
2022 | 0 | will | ccc | … | ooo |
2021 | 1 | bob | ddd | … | ppp |
2021 | 1 | bill | eee | … | qqq |
2021 | 1 | will | fff | … | rrr |
2020 | 2 | bob | ggg | … | sss |
2020 | 2 | bill | hhh | … | ttt |
2020 | 2 | will | iii | … | uuu |
2019 | 3 | bob | jjj | … | vvv |
2019 | 3 | bill | kkk | … | xxx |
2019 | 3 | will | lll | … | zzz |
,,, | … | … | … | … | … |
And I wish to transform it into a dataset like this:
user_name | field1_diif_year_0 | …_diif_year_0 | field30_diif_year_0 | field1_diif_year_1 | …_diif_year_1 | field30_diif_year_1 | field1_diif_year_2 | …_diif_year_2 | field30_diif_year_2 | field1_diif_year_3 | …_diif_year_3 | field30_diif_year_3 |
bob | aaa | … | mmm | ddd | … | ppp | ggg | … | sss | jjj | … | vvv |
bill | bbb | … | nnn | eee | … | qqq | hhh | … | ttt | kkk | … | xxx |
will | ccc | … | ooo | fff | … | rrr | iii | … | uuu | lll | … | zzz |
… | … | … | … | … | … | … | … | … | … | … | … | … |
The main challenge is the amount of fields (30+)
Any Help would be very much appreciate.
Best Regards!
Solved! Go to Solution.
@mauricio
I hope I understand your logic correctly.
@mauricio
Thank you for the feedback and glad to be any help.
@flying008
Your flow is indeed better with the consideration of sequence.
Thank you.