Want to get involved? We're always looking for ideas and content for Weekly Challenges.
SUBMIT YOUR IDEADefinitely enjoyed this one, cleaned the data, removed '?', and did some sorting based on favorite original tweets and counting hastags. Definitely a lot more that could be done within tweets themselves to do analysis, and lots of chaos to figure out. Sustainable Development goals played a huge part in all the tweets, and certain countries showed up more than others.
Here's my solution for week #89. I found a few interesting issues while exploring the data (in the spoiler tag).
For next week (#90): I'm going to challenge myself to use Alteryx's reporting tools, so I'm going to keep the analysis pretty basic. I'm going to look at:
Simple data clean up
I kept mine pretty simple - will decide how to summarize, sort and sample the data in the next challenge.
1. Parsed the hashtags to rows
2. Combined date and time
3. Replaced all null hashtags with filename
4. Changed all hashtags to the same case
5. Filtered out where hashtags contain ?
6. Removed dups based on ID, Tweet and Hashtag
I wanted to use the cognitive service analytics tool, but it seems Azure services are no longer free, so, a chance of brushing my rusty python and use the new tool
First bringing all the tweets into a yxdb file:
Then a bit of processing, removing duplicates, and getting their "polarity"
Thanks to Zoe Wilkinson Saldaña for the detailed how-to on Python and Vader
Cheers!