Weekly Challenges

JosephSerpis · ‎02-26-2018

Challenge Completed

Ozzy_Campos · ‎02-28-2018

Definitely enjoyed this one, cleaned the data, removed '?', and did some sorting based on favorite original tweets and counting hastags. Definitely a lot more that could be done within tweets themselves to do analysis, and lots of chaos to figure out. Sustainable Development goals played a huge part in all the tweets, and certain countries showed up more than others.

kelly_gilbert · ‎04-02-2018

Here's my solution for week #89. I found a few interesting issues while exploring the data (in the spoiler tag).

For next week (#90): I'm going to challenge myself to use Alteryx's reporting tools, so I'm going to keep the analysis pretty basic. I'm going to look at:

The frequency of each hashtag
The count of distinct users using each hashtag
The timing (did the hashtags peak at different times?)

Spoiler

Some findings from exploration in week 89:

1 - if a tweet had multiple hashtags, the tweet may be duplicated across the files. The ID field is unique.
2 - all of the csv files have the same schema.
3 - hashtags may not always be capitalized in the same way; you may want to convert to all upper or all lowercase if using case-sensitive formulas/tools
4 - the Tweet field is sometimes truncated, and in some cases, the hashtags were cut off. If the hashtag does not appear in the Tweet field, then it also does not appear in the Hashtag field. As a result, sometimes the Hashtag field is null.
5 -since we know the tweets were harvested based on hashtag, then we know that every tweet in the file should contain that file's hashtag. For example, every tweet in the 'globalgoals' file should contain the #globalgoals hashtag. We can rebuild the Hashtag field to include the 10 hashtags of interest, but if any other hashtags were truncated, we don't know about them.

Here is an example. This tweet (ID# 914489241266278000) does have the #act4sdgs hashtag, but it is truncated from the Tweet field and thus not present in the Hashtag field in the act4sdgs csv file.

Some findings from exploration in week 89:1 - if a tweet had multiple hashtags, the tweet may be duplicated across the files. The ID field is unique.2 - all of the csv files have the same schema.3 - hashtags may not always be capitalized in the same way; you may want to convert to all upper or all lowercase if using case-sensitive formulas/tools4 - the Tweet field is sometimes truncated, and in some cases, the hashtags were cut off. If the hashtag does not appear in the Tweet field, then it also does not appear in the Hashtag field. As a result, sometimes the Hashtag field is null.5 -since we know the tweets were harvested based on hashtag, then we know that every tweet in the file should contain that file's hashtag. For example, every tweet in the 'globalgoals' file should contain the #globalgoals hashtag. We can rebuild the Hashtag field to include the 10 hashtags of interest, but if any other hashtags were truncated, we don't know about them.Here is an example. This tweet (ID# 914489241266278000) does have the #act4sdgs hashtag, but it is truncated from the Tweet field and thus not present in the Hashtag field in the act4sdgs csv file.

danilang · ‎07-30-2018

Simple data clean up

Spoiler

kat · ‎08-22-2018

Spoiler

Vinutha · ‎09-07-2018

Kept it simple. Just data cleansing ans summarizing. Can also show tweets by region, unique users, etc.

DavidP · ‎09-12-2018

I kept mine pretty simple - will decide how to summarize, sort and sample the data in the next challenge.

1. Parsed the hashtags to rows

2. Combined date and time

3. Replaced all null hashtags with filename

4. Changed all hashtags to the same case

5. Filtered out where hashtags contain ?

6. Removed dups based on ID, Tweet and Hashtag

dsmdavid · ‎09-27-2018

I wanted to use the cognitive service analytics tool, but it seems Azure services are no longer free, so, a chance of brushing my rusty python and use the new tool

Spoiler

First bringing all the tweets into a yxdb file:

Then a bit of processing, removing duplicates, and getting their "polarity"

First bringing all the tweets into a yxdb file:Then a bit of processing, removing duplicates, and getting their "polarity"

Thanks to Zoe Wilkinson Saldaña for the detailed how-to on Python and Vader

jssandom · ‎10-26-2018

Kept it quite simple and did number of tweets by hashtag.. also used Alteryx Interactive Chart for the first time!

JoBen · ‎11-07-2018

Cheers!

Spoiler

I chose to stick with the Alteryx reporting suite. Here are the visuals that I came up with.

Weekly Challenges

IDEAS WANTED

Challenge #89: Analyzing Social Data