Analyzing irregular data from Facebook Messenger
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey there,
I would like to analyze data I downloaded from my Facebook Messenger, however the format the data arrived in is puzzling for me.
You can see the raw data in the first column. I described it in the second column.
You'll notice that most of the data follows the pattern of Sender > Message> Reaction (if any) > Date.
When a photo or GIF is sent, the cell is empty.
When there is no reaction to a message or a photo/gif, the data follows the pattern of Sender > Message> Date.
Raw data | My classification |
Person A | Sender |
Hello | Message |
❤Person B | Reaction |
Jun 09, 2023 9:21:51pm | Date |
Person A | Sender |
Photo (when a photo is sent, the row is empty) | |
❤Person B | Reaction |
Jun 09, 2023 9:21:38pm | Date |
Person B | Sender |
Hey there | Message |
Jun 09, 2023 9:10:51pm | Date |
Person B | Sender |
Good morning | Message |
Jun 09, 2023 9:10:45pm | Date |
Person A | Sender |
Top of the morning | Message |
❤Person B | Reaction |
Jun 09, 2023 9:09:37pm | Date |
As I have hundreds of thousands rows of data, doing the manual classification for all rows is not an option :)
I scoured the community for an answer but I could only find solutions that handled such data when the data followed a more regular pattern than here.
Could you please advise me on the steps that would get me to the following data structure?
Thank you in advance :)
Sender | Message | Reaction | Date |
Person A | Hello | ❤Person B | Jun 09, 2023 9:21:51pm |
Person A | Picture/gif | ❤Person B | Jun 09, 2023 9:21:38pm |
Person B | Hey there | Jun 09, 2023 9:10:51pm | |
Person B | Good morning | Jun 09, 2023 9:10:45pm | |
Person A | Top of the morning | ❤Person B | Jun 09, 2023 9:09:37pm |
Solved! Go to Solution.
- Labels:
- Datasets
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @birdie124 ,
Here is one way of doing this.
The approach is to simply fill the category one by one using IF clauses.
As the sample data is small, there might be still some exceptional cases with larger data set.
So please test with larger data and modify the formula if necessary.
Workflow
Formula tool
Category = IF REGEX_Match([data], "[^[:ascii:]].*") THEN "Reaction" ELSE Null() ENDIF
Category = IF IsNull([Category]) AND REGEX_Match([data], "\w{3}\s\d{2},\s\d{4}.*") THEN "Date" ELSE [Category] ENDIF
Category = IF IsNull([data]) THEN "Photo" ELSE [Category] ENDIF
Multi-Row Formula
Category =
IF IsEmpty([Row-1:data]) AND IsEmpty([Row-1:Category]) OR [Row-1:Category] = "Date" THEN "Sender" ELSE [Category] ENDIF
Formula
Category = IF IsNull([Category]) THEN "Message" ELSE [Category] ENDIF
Multi-Row Formula
IF [Category] = "Sender" THEN [Row-1:RecordID] + 1 ELSE [Row-1:RecordID] ENDIF
Output
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Works like a charm! Thank you very much for your solution, Yoshiro! :)
![](/skins/images/303A6AD65FCA043F1D2938424191616C/responsive_peak/images/icon_anonymous_message.png)