Showing results for 
Search instead for 
Did you mean: 

Weekly Challenge

Solve the challenge, share your solution and summit the ranks of our Community!
New content is available in Academy! You may need to clear your browser cache for an optimal viewing experience

Challenge #89: Analyzing Social Data

Alteryx Certified Partner
Alteryx Certified Partner

Challenge Completed


Definitely enjoyed this one, cleaned the data, removed '?', and did some sorting based on favorite original tweets and counting hastags.  Definitely a lot more that could be done within tweets themselves to do analysis, and lots of chaos to figure out.  Sustainable Development goals played a huge part in all the tweets, and certain countries showed up more than others.  

Here's my solution for week #89. I found a few interesting issues while exploring the data (in the spoiler tag).  

For next week (#90): I'm going to challenge myself to use Alteryx's reporting tools, so I'm going to keep the analysis pretty basic. I'm going to look at:

  1. The frequency of each hashtag
  2. The count of distinct users using each hashtag
  3. The timing (did the hashtags peak at different times?)
Some findings from exploration in week 89:

1 - if a tweet had multiple hashtags, the tweet may be duplicated across the files. The ID field is unique.
2 - all of the csv files have the same schema.
3 - hashtags may not always be capitalized in the same way; you may want to convert to all upper or all lowercase if using case-sensitive formulas/tools
4 - the Tweet field is sometimes truncated, and in some cases, the hashtags were cut off.  If the hashtag does not appear in the Tweet field, then it also does not appear in the Hashtag field. As a result, sometimes the Hashtag field is null.
5 -since we know the tweets were harvested based on hashtag, then we know that every tweet in the file should contain that file's hashtag. For example, every tweet in the 'globalgoals' file should contain the #globalgoals hashtag. We can rebuild the Hashtag field to include the 10 hashtags of interest, but if any other hashtags were truncated, we don't know about them.

Here is an example.  This tweet (ID# 914489241266278000) does have the #act4sdgs hashtag, but it is truncated from the Tweet field and thus not present in the Hashtag field in the act4sdgs csv file.



Simple data clean up

Solution 89.png
Challenge #89.PNG

Kept it simple. Just data cleansing ans summarizing. Can also show tweets by region, unique users, etc.


I kept mine pretty simple - will decide how to summarize, sort and sample the data in the next challenge.


1. Parsed the hashtags to rows

2. Combined date and time

3. Replaced all null hashtags with filename

4. Changed all hashtags to the same case

5. Filtered out where hashtags contain ?

6. Removed dups based on ID, Tweet and Hashtag


Alteryx Partner

I wanted to use the cognitive service analytics tool, but it seems Azure services are no longer free, so, a chance of brushing my rusty python and use the new tool


First bringing all the tweets into a yxdb file:


Then a bit of processing, removing duplicates, and getting their "polarity"


Thanks to Zoe Wilkinson Saldaña for the detailed how-to on Python and Vader

Alteryx Certified Partner

Kept it quite simple and did number of tweets by hashtag.. also used Alteryx Interactive Chart for the first time!


Number of Tweets.png



I chose to stick with the Alteryx reporting suite. Here are the visuals that I came up with.