Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Data Science

Machine learning & data science for beginners and experts alike.
SusanCS
Alteryx Alumni (Retired)

My dog loves napping in his super-fuzzy dog bed. And I have to confess: I like to think I’m a rational consumer, but I bought him the bed because of cute photos and a discount code shared by a social media influencer. 

 

Identifying social media influencers who can help promote your business is both an art and a science. There are plenty of commercial services that say they can tell you who those people are. But why pay for that service when you can use a tool already at your fingertips to find and analyze potential influencers and their posts? Alteryx has network analysis capabilities that can help you identify these people and determine whether they’re a good fit for your needs.

 

Let’s take a closer look at the Network Analysis Tool and build our own workflow to identify potential Twitter influencers.



SusanCS_0-1618848621022.gif

 Image via GIPHY



Retrieving and Preparing Tweets

A while back, I demonstrated how to retrieve and analyze tweets using the Twitter API, the user-created Twitter API Authorization Header macro, and the Sentiment Analysis Tool from the Alteryx Intelligence Suite. 

 

You can use the approach and workflow provided in that post to get started on our influencer identifier. You might choose a keyword, a location, or — as I will do here — a hashtag relevant to your interests as your starting point.

 

I’m going to look at tweets with the hashtag #ODSCEast from the recent Open Data Science Conference East. One use for these tweets could be identifying influencers who might be helpful in promoting our Data Science Mixer podcast and/or could be future guests. 

 

I retrieved tweets using that hashtag twice a day for all three days of the conference, resulting in a collection of 600 tweets. Unfortunately, Twitter’s standard search limits access to tweets, but this sample is a good starting point. 



SusanCS_1-1618848621039.gif

Image via GIPHY



The number of followers someone has is just one possible measure of influence on Twitter. Another way to think about influence might be to examine who is often connected with other people in actual tweets — who often is linked with others due to common interests and broad recognition. In the case of this conference, people might be mentioned together in tweets related to upcoming sessions or talks, revealing connections that wouldn’t be evident otherwise. Users who co-occurred often with other users in the collected tweets could be key connections, helpful for reaching a wide audience. This is the approach we’ll try here.

 

After parsing the Twitter data, I wanted just the usernames of everyone mentioned in the tweets, so I used the RegEx Tool and the expression @(\w+) to tokenize the usernames into rows. With a big assist from @NeilR on the data wrangling, plus some ideas from this post by @BenMoss, everything eventually got into the form I wanted prior to network analysis: a two-field, 155-row table with the pairs of usernames that had actually appeared together in tweets, and a one-field, 115-row table with just the unique usernames of everyone who had shown up in any tweet. The first rows of each table are below.

 

 

SusanCS_2-1618848620379.png

 

SusanCS_3-1618848620337.png

 


Constructing the Network

As usual, the process of generating those two tables took a lot longer than actually analyzing the data! I used the Network Analysis Tool to see how the Twitter users I identified were all interconnected in the tweets I’d gathered. 

 

Let’s start with the resulting diagram of the network and work backwards to explore how it was formed. The interactive dashboard below is available from the I output of the Network Analysis Tool. (You can also export it to various formats with the Render Tool, such as HTML, which is how I was able to embed the diagram below.)



 

 

In this diagram, the circles are “nodes.” Each Twitter user identified here is considered a node in this network. The lines between the nodes are called “edges.” As you can see in the network graph, most edges lead to @odsc, the Twitter account of the organizers of the conference, and it makes sense that they would end up central to the discussion of their own event. 

 

However, as I mouse over and click on the individual nodes, it looks like nodes other than @odsc are also pretty well interconnected. For example, @aliciaframe1 mentioned other users or was mentioned by them fairly often, as revealed by the blue nodes and edges below:

SusanCS_5-1618848620373.png

 

 

In addition to exploring the interactive diagram, I can also use the numeric output from the Network Analysis Tool to examine my potential influencers more closely. The output includes five network centrality measures, each of which reflect different ways of evaluating how “central” a node is to a network. You can read about all the centrality measures, but here are simplified definitions of each:

 

 

  • Betweenness: the number of times a node serves as a bridge on the shortest path between other nodes. A node that is often a bridge can control the spread of information, allowing or limiting its flow.
  • Degree: the number of nodes one link away from any one node. As one source states, “Though simple, degree is often a highly effective measure of the influence or importance of a node: In many social settings people with more connections tend to have more power and [are] more visible.”
  • Closeness: the average length of the shortest path possible from a specific node to all the other nodes in the network. The more central a node, the closer all the other nodes. This measure is sometimes used to reflect how quickly information might spread among nodes in a network.
  • Eigenvalue centrality (“evcent” field in Designer): a measure of how influential a certain node is within the network, assigned relative to all the other nodes. The score is based on the idea that connections from “high-scoring” nodes are more valuable than connections from “low-scoring” nodes.
  • PageRank: yes, that PageRank you may have heard of. It’s somewhat similar to eigenvalue centrality, but it also includes the direction of the links between nodes and the weight or importance of those links, which can help identify people perceived as authoritative by others. 



As you would expect from the top diagram above, the @odsc account scores most highly on all the centrality measures. However, looking further into the data reveals which individuals and companies were notable nodes during the conference. 



SusanCS_6-1618848621329.gif

Image via GIPHY



Following this procedure with the goal of identifying influencers, you might be most interested in the degree or PageRank metrics. It would also be helpful to join your network analysis output with the original user information retrieved from Twitter in order to have their centrality measures, profile, links, and follower data all together. This information will enrich your new insights into how these users have co-occurred with others in the collected tweets. You could then sort by followers, find users in specific locations, and also evaluate their centrality within the relevant network. 

 

And, to get extra meta, you could even retrieve the lists of followers of your first round of potential influencers, and add them to your network analysis. Doing so would enlarge the network and might introduce people less tightly connected to your main search topic. However, if your initial gathering of account names resulted in a small number of potential influencers, this additional collection might help you identify more people to consider.



Investigating the Influencers

Finally, you can use this same process to retrieve a sample of potential influencers’ recent tweets, then automate “reading” their past posts. With the Alteryx Intelligence Suite tools for word clouds and sentiment analysis, you can quickly get a sense of the content and tone of your influencer candidates’ social discussions.

 

Whether you’re selling dog beds to indulgent pet parents, building a podcast audience, or spreading public health information, social media influencers can be a powerful resource for disseminating your message. Get a handle on their conversations quickly with these tools. 



How have you used network analysis or social media data? Do you still have questions? Which other tools or data science concepts would you like to see addressed here on the blog? Let me know with a comment below, and subscribe to the blog to get future articles.



Recommended Reading




Blog teaser image by Johannes Groll on Unsplash

Susan Currie Sivek
Senior Data Science Journalist

Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek

Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek

Comments
atcodedog05
22 - Nova
22 - Nova

Hi @SusanCS 

 

This is such a hot topic and an amazing read. Knowing that this can be implemented with help of Alteryx is just mind blowing.

 

Thank you for the article 🙂

SusanCS
Alteryx Alumni (Retired)

Thanks, @atcodedog05! So glad you enjoyed the article. Let us know how it goes if you try this out! 😀

EJ_Alt
7 - Meteor

Thanks @SusanCS I will be trying this out in the next couple of days to track sentiment in the London Mayoral elections here in the UK.

SusanCS
Alteryx Alumni (Retired)

That sounds awesome, @EJ_Alt. Thanks for commenting! How will you be doing your sentiment analysis, if you can share? What an interesting project. 

Ken_Black
9 - Comet
9 - Comet

Susan,

 

I arrived at this article by reading the awesome Alteryx-based recommendation engine article (https://www.linkedin.com/feed/update/urn:li:activity:6841434493027397632/). Isn't that interesting?  What I find so amusing is that I started my LinkedIn article by making a confession, and so did you in this article! Is this a case of great minds thinking alike?

 

What I love about your work is that in several cases, I have been able to go back into my archives and see how I was trying to do similar work many years ago. I previously showed that with word clouds. Now I can go back to my oldie-but-goodie articles (circa 2013) and show how I was processing Twitter content for various topics of interest. This once again shows how far Alteryx has come in developing world-class capabilities for us to use.

 

Thank you very much for your fun articles,

 

Ken

 

Twitter_on_my_blog.JPG

 

SusanCS
Alteryx Alumni (Retired)

@Ken_Black, that's so cool to see! You've just been ahead of your time. 😄I'm glad you enjoyed this article and the one on the recommendation engine — isn't it fun to get that look behind the scenes? Inspiring stuff for sure.