This week’s episode of the Alter Everything podcast showcases Carlene Jones, data and analytics consultant, and Nynne Haagensen, a data enthusiast who worked with Carlene. Carlene and Nyenne discuss the wide range of skills that data professionals need, which extend well beyond the technical aspects of data manipulation and analysis. Their conversation reinforces that people skills, communication abilities and business savvy are all critical to success in data science and analytics.
To explore online conversations around this skill set, I decided to gather and analyze some data, naturally. You may have recently read the first two parts of @SydneyF's fantastic topic modeling trilogy here on the Data Science blog (part 3 is coming soon!). This seemed like an opportunity to apply topic modeling to what folks have discussed out there on the interwebz about the data science skill set. (Topic Modeling is part of the Alteryx Intelligence Suite, which offers some awesome new text mining tools.)
I built a workflow in Designer that scraped 64 articles from the data science site KDnuggets tagged “skills” and cleaned up the text. I also used Text Pre-processing to quickly prep the remaining text before sending it into the Topic Modeling and Word Cloud tools. The word cloud below gives you a preview of some of the prominent ideas, but topic modeling lets us dig a little deeper.
I asked the Topic Modeling tool to identify three dominant topics in the text of these articles. You should definitely read all the details on how this process works, but in a nutshell: This is an unsupervised approach, meaning that I’m not specifying what I want the model to find in advance, but rather letting it identify on its own the key ideas in the text of the articles. This tool assumes that each chunk of text I feed it is a mixture of those three different topics, since I asked for three. It figures out how those topics are represented in each chunk based on the probability that certain words occur together. It doesn't give a name to the topics it finds, though; it needs us to figure out what its groupings of words mean.
The topic model that results from this analysis is open to interpretation, but here’s what I see. Topic 1 looks to describe the role of the data analyst or data scientist within an organization, with some technical terms mentioned (Python, SQL, Hadoop). However, it also includes concepts like “value,” “market” and “demand” that could reflect the business expertise a skilled data professional brings to the organization. Some of the chunks of original text that scored highly for the presence of Topic 1 include:
Topic 2 has “learning” as its most relevant term and “machine” in second place, so a quick conclusion would be that Topic 2 reflects the prominence of machine learning skills for data science. However, a closer review suggests that maybe “learning” could also be interpreted in another way. Some of the chunks of text that scored highly for Topic 2 include:
Some of the other terms included in this topic are “question,” “understand,” “team,” “approach” and “offer.” This topic seems to have a theme of ongoing learning and skill development for the data professional.
Finally, Topic 3 looks like it represents the intersection of technical skills and problem-solving, with terms “problem,” “solve,” “think,” “model,” and “code” showing up as highly relevant. “Math” also appears here, as do “research” and “concept,” suggesting some of the more specific intellectual skills useful in the data fields.
Yes, it is a lengthy list of skills indeed! This quick analysis suggests that in discussions of data science skills, there is a recurring emphasis not just on technical skills, but on the capabilities that put data analyses into human and business contexts. The best model or analysis doesn’t mean much without humans empowered to figure out the right problem-solving strategy, the questions to ask, the methods to use and the interpretation of their results.
Learn more about how Carlene and Nynne view the skills needed for a data-driven company culture and professional success in this week’s Alter Everything episode.
Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek
Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.