Want to get involved? We're always looking for ideas and content for Weekly Challenges.
SUBMIT YOUR IDEASimple union to combine all files and a small bit of cleaning the data
summarizing the important tweets by retweeting count..
I decided to union and aggregate the data by user, how many related tweets they made, the exposure in terms of followers and retweets, and what hashtags were used.
I decided to look at some summary statistics and identify highest users and highest used hashtags. There is an average of 3.8 hashtags per tweet, and the most commonly used hashtag is #SDGs. As it turns out, the top 19 users are responsible for 10% of total tweets. If we limit this to original content (filtering out retweets), there are 46 users who generate the top 10% of content. Looking at what gets retweeted, there is one user - ONE! - whose retweets account for 8.5% of all retweets. Next steps for me would be locating these users on a map, and generating some heatmaps for the hashtags.
I kept it pretty simple but aimed to understand the relationship between followers and total tweets. I could probably find some more value in removing outliers, but the outliers were actually what I found interesting. The accounts with the most tweets are in the lowest group of followers, and the opposite is true for the accounts with the most followers (very few tweets).
@JefBus Your solution to challenge #89 is awesome. Great job.