I wanted to do some text analysis so I thought it would be interesting to focus in on the originally-authored tweets, to deal with duplication of tweets caused by RTs.
I built a data prep workflow that excludes RTs, and outputs two TDE files for use in Tableau: the first to use for a wordcloud, with tweet text parsed and prepared, and the second to use for applying filters to that wordcloud (i.e. the date, dataset, and other tweet-level metadata).
UPDATE: Better handling of non-words.
I'm stuck in a loop and have a very specific question. Hope someone can help. Of course, if my question is not OK, please feel free to remove!
I noticed many tweets had a "We" phrase such as "We must...." and "We need...." I pulled out the we phrases and did some counts a few different ways. I could spend hours playing with this but for Challenge #90, I'm going to just make a word cloud to show some of the more popular "We" phrases in the data..