Hi,
I'm trying to transform a table, to create some features (new columns) for a statistic model.
My initial dataset is like this:
| Year | dif_year | user_name | field1 | … | field30 |
| 2022 | 0 | bob | aaa | … | mmm |
| 2022 | 0 | bill | bbb | … | nnn |
| 2022 | 0 | will | ccc | … | ooo |
| 2021 | 1 | bob | ddd | … | ppp |
| 2021 | 1 | bill | eee | … | qqq |
| 2021 | 1 | will | fff | … | rrr |
| 2020 | 2 | bob | ggg | … | sss |
| 2020 | 2 | bill | hhh | … | ttt |
| 2020 | 2 | will | iii | … | uuu |
| 2019 | 3 | bob | jjj | … | vvv |
| 2019 | 3 | bill | kkk | … | xxx |
| 2019 | 3 | will | lll | … | zzz |
| ,,, | … | … | … | … | … |
And I wish to transform it into a dataset like this:
| user_name | field1_diif_year_0 | …_diif_year_0 | field30_diif_year_0 | field1_diif_year_1 | …_diif_year_1 | field30_diif_year_1 | field1_diif_year_2 | …_diif_year_2 | field30_diif_year_2 | field1_diif_year_3 | …_diif_year_3 | field30_diif_year_3 |
| bob | aaa | … | mmm | ddd | … | ppp | ggg | … | sss | jjj | … | vvv |
| bill | bbb | … | nnn | eee | … | qqq | hhh | … | ttt | kkk | … | xxx |
| will | ccc | … | ooo | fff | … | rrr | iii | … | uuu | lll | … | zzz |
| … | … | … | … | … | … | … | … | … | … | … | … | … |
The main challenge is the amount of fields (30+)
Any Help would be very much appreciate.
Best Regards!
Solved! Go to Solution.
@mauricio
I hope I understand your logic correctly.
@mauricio
Thank you for the feedback and glad to be any help.
@flying008
Your flow is indeed better with the consideration of sequence.
Thank you.
