Hey there,
I would like to analyze data I downloaded from my Facebook Messenger, however the format the data arrived in is puzzling for me.
You can see the raw data in the first column. I described it in the second column.
You'll notice that most of the data follows the pattern of Sender > Message> Reaction (if any) > Date.
When a photo or GIF is sent, the cell is empty.
When there is no reaction to a message or a photo/gif, the data follows the pattern of Sender > Message> Date.
Raw data | My classification |
Person A | Sender |
Hello | Message |
❤Person B | Reaction |
Jun 09, 2023 9:21:51pm | Date |
Person A | Sender |
Photo (when a photo is sent, the row is empty) | |
❤Person B | Reaction |
Jun 09, 2023 9:21:38pm | Date |
Person B | Sender |
Hey there | Message |
Jun 09, 2023 9:10:51pm | Date |
Person B | Sender |
Good morning | Message |
Jun 09, 2023 9:10:45pm | Date |
Person A | Sender |
Top of the morning | Message |
❤Person B | Reaction |
Jun 09, 2023 9:09:37pm | Date |
As I have hundreds of thousands rows of data, doing the manual classification for all rows is not an option :)
I scoured the community for an answer but I could only find solutions that handled such data when the data followed a more regular pattern than here.
Could you please advise me on the steps that would get me to the following data structure?
Thank you in advance :)
Sender | Message | Reaction | Date |
Person A | Hello | ❤Person B | Jun 09, 2023 9:21:51pm |
Person A | Picture/gif | ❤Person B | Jun 09, 2023 9:21:38pm |
Person B | Hey there | Jun 09, 2023 9:10:51pm | |
Person B | Good morning | Jun 09, 2023 9:10:45pm | |
Person A | Top of the morning | ❤Person B | Jun 09, 2023 9:09:37pm |
Solved! Go to Solution.
Hi @birdie124 ,
Here is one way of doing this.
The approach is to simply fill the category one by one using IF clauses.
As the sample data is small, there might be still some exceptional cases with larger data set.
So please test with larger data and modify the formula if necessary.
Workflow
Formula tool
Category = IF REGEX_Match([data], "[^[:ascii:]].*") THEN "Reaction" ELSE Null() ENDIF
Category = IF IsNull([Category]) AND REGEX_Match([data], "\w{3}\s\d{2},\s\d{4}.*") THEN "Date" ELSE [Category] ENDIF
Category = IF IsNull([data]) THEN "Photo" ELSE [Category] ENDIF
Multi-Row Formula
Category =
IF IsEmpty([Row-1:data]) AND IsEmpty([Row-1:Category]) OR [Row-1:Category] = "Date" THEN "Sender" ELSE [Category] ENDIF
Formula
Category = IF IsNull([Category]) THEN "Message" ELSE [Category] ENDIF
Multi-Row Formula
IF [Category] = "Sender" THEN [Row-1:RecordID] + 1 ELSE [Row-1:RecordID] ENDIF
Output
Works like a charm! Thank you very much for your solution, Yoshiro! :)
User | Count |
---|---|
106 | |
82 | |
70 | |
54 | |
40 |