Want to get involved? We're always looking for ideas and content for Weekly Challenges.
SUBMIT YOUR IDEAGot the basics covered. I created additional flags as some hashtags were embedded with other words, e.g. "SDGS".
I took a step back and looked at the data in the various files before starting. After determining that they had similar structures, I joined the files, removed redundancies, and then grouped the data into three streams. I was interested to see who the most active users were (by number of tweets, retweets, followers, etc.); the topics that garnered the most interest; and the most active locations (tying topics to geography).
I pared down the data to include only the records with multiple actions. Although the data could have been regrouped to create a single file with the relevant data, I left the data in their relevant group. Also, for the locations, I intentionally did not spend a lot of time on cleanup. (That's a whole other project).
In retrospect, I could probably have done more cleanup on the joined files, minimizing downstream work.
I considered it done when I can get all the data from 10 files.
I went the route of two fold.
First, i wanted to find the Top influencers to see if there were any heavy influencers, In the end, I settled with the Top 10 influencers as shown in the below spoiler.
Second: In the spirit of our current election forces, I wanted check for any twitter bots that fed the messages. Unfortunately, user join date was not included as that would be one sign. Instead, I calculated the % of a user's total tweets were consumed by these topics. Anything > 50% of a user's total tweets dedicated to this topic was filtered out as a 'bot'. 137 users found
Next steps for visualization could be to create tables and render output.