Hi,
I need help with transforming the input data on sequence of events by user_id into 2 outputs.
Note that number of event per user varies.
See examples if input and expected output below and in attached file:
Input spreadsheet
event_id | user_id | from_status | to_status | created_at | updated_at |
1 | 7 | active_not_posted | active_posted | 22:11.8 | 22:11.8 |
246 | 7 | active_posted | filled | 06:30.7 | 06:30.7 |
3 | 9 | active_not_posted | active_posted | 58:18.0 | 58:18.0 |
91 | 9 | active_posted | on_hold | 34:42.7 | 34:42.7 |
1744 | 9 | on_hold | cancelled | 05:27.3 | 05:27.3 |
1746 | 9 | active_not_posted | cancelled | 05:32.2 | 05:32.2 |
1745 | 9 | cancelled | active_not_posted | 05:29.8 | 05:29.8 |
Output1 spreadsheet
user_id | Event_sequence | Event | Event_date |
7 | 1 | active_not_posted | 22:11.8 |
7 | 2 | active_posted | 22:11.8 |
7 | 3 | filled | 06:30.7 |
9 | 1 | active_not_posted | 58:18.0 |
9 | 2 | active_posted | 58:18.0 |
9 | 3 | on_hold | 34:42.7 |
9 | 4 | cancelled | 05:27.3 |
9 | 5 | active_not_posted | 05:29.8 |
9 | 6 | cancelled | 05:32.2 |
Output2 spreadsheet
user_id | Full_event_sequence | Sequence_start | Sequence_end |
7 | active_not_posted - active_posted - filled | 22:11.8 | 06:30.7 |
9 | active_not_posted - active_posted - on_hold - cancelled - active_not_posted - cancelled | 58:18.0 | 05:32.2 |
Thank you,
Eugene
Solved! Go to Solution.
hi @EugeneY
Your times have no hour info, so that there's no way to sort the data that will swap event_id 1745 and 1746 to get them in your output order, without swapping event_id 3 and 91 at the same time. Check the output of @DavidP's workflow where the last 2 items are not in your output order
Is the Event_ID unique and ascending for each user_ID? If so, you could order by User_Id and then Event_Id giving you the proper order in your output
Dan
Hi Danilang/Castor,
My input data in excel file contains the data with date and hours (e.g., 3/1/2017 4:22:12 PM). I apologize for confusion made by my copy-paste in the body of my question.
To note, user_id and event_id contains unique values where the event_id value is assigned by the application in the ascending order the status change was made and is expected to correlate with time of the change.
Thank you,
Eugene
Hi Dave,
Thanks for your solution.
To be 100% sure that the sequence of events is captured, I added a sort step with user_id ascending at the first step and created_at date at the second step.
See my snapshot.
The ETL produces expected results as in the small test sample I shared with you and on my real data set.
Appreciate your help,
Eugene