Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #89: Analyzing Social Data

JosephSerpis
17 - Castor
17 - Castor

Challenge Completed

Ozzy_Campos
8 - Asteroid

Definitely enjoyed this one, cleaned the data, removed '?', and did some sorting based on favorite original tweets and counting hastags.  Definitely a lot more that could be done within tweets themselves to do analysis, and lots of chaos to figure out.  Sustainable Development goals played a huge part in all the tweets, and certain countries showed up more than others.  

kelly_gilbert
13 - Pulsar

Here's my solution for week #89. I found a few interesting issues while exploring the data (in the spoiler tag).  

For next week (#90): I'm going to challenge myself to use Alteryx's reporting tools, so I'm going to keep the analysis pretty basic. I'm going to look at:

  1. The frequency of each hashtag
  2. The count of distinct users using each hashtag
  3. The timing (did the hashtags peak at different times?)
Spoiler
Some findings from exploration in week 89:

1 - if a tweet had multiple hashtags, the tweet may be duplicated across the files. The ID field is unique.
2 - all of the csv files have the same schema.
3 - hashtags may not always be capitalized in the same way; you may want to convert to all upper or all lowercase if using case-sensitive formulas/tools
4 - the Tweet field is sometimes truncated, and in some cases, the hashtags were cut off.  If the hashtag does not appear in the Tweet field, then it also does not appear in the Hashtag field. As a result, sometimes the Hashtag field is null.
5 -since we know the tweets were harvested based on hashtag, then we know that every tweet in the file should contain that file's hashtag. For example, every tweet in the 'globalgoals' file should contain the #globalgoals hashtag. We can rebuild the Hashtag field to include the 10 hashtags of interest, but if any other hashtags were truncated, we don't know about them.

Here is an example.  This tweet (ID# 914489241266278000) does have the #act4sdgs hashtag, but it is truncated from the Tweet field and thus not present in the Hashtag field in the act4sdgs csv file.

Capture.PNG

Capture1.PNG
danilang
19 - Altair
19 - Altair

Simple data clean up

Spoiler
Solution 89.png
kat
12 - Quasar
Spoiler
Challenge #89.PNG
Vinutha
8 - Asteroid

Kept it simple. Just data cleansing ans summarizing. Can also show tweets by region, unique users, etc.

DavidP
17 - Castor
17 - Castor

I kept mine pretty simple - will decide how to summarize, sort and sample the data in the next challenge.

 

1. Parsed the hashtags to rows

2. Combined date and time

3. Replaced all null hashtags with filename

4. Changed all hashtags to the same case

5. Filtered out where hashtags contain ?

6. Removed dups based on ID, Tweet and Hashtag

 

dsmdavid
11 - Bolide

I wanted to use the cognitive service analytics tool, but it seems Azure services are no longer free, so, a chance of brushing my rusty python and use the new tool

Spoiler

First bringing all the tweets into a yxdb file:

input.png

Then a bit of processing, removing duplicates, and getting their "polarity"

sa.png

Thanks to Zoe Wilkinson Saldaña for the detailed how-to on Python and Vader

jssandom
8 - Asteroid

Kept it quite simple and did number of tweets by hashtag.. also used Alteryx Interactive Chart for the first time!

 

Number of Tweets.png

JoBen
11 - Bolide

Cheers!

Spoiler
I chose to stick with the Alteryx reporting suite. Here are the visuals that I came up with.

Challenge_89.png