Data Science

Machine learning & data science for beginners and experts alike.
SusanCS
Alteryx Alumni (Retired)

This week’s podcast episode features a discussion between @TuvyL, manager of our ACE Program, and @DavidN, vice president for commercial sales at Alteryx. Among lots of great observations about career paths and success, they discuss the differences among genders in how people approach mentorship and asking for support at work -- and how important it is to promote diverse voices and perspectives.

 

Data science as a field has its own shortcomings with regard to diversity. One source of information on the variety of folks working in this field is the 2019 Kaggle ML & DS Survey, which asks professionals and students to respond to an array (data science pun intended) of questions about their gender, job title, income, nationality and more. (Race is not one of the included questions.) The survey data from 2017 and 2018 are also available, so this is a nice opportunity for longitudinal analysis.

 

I thought I’d take a closer look at the Kaggle data with Designer to see if any notable patterns in gender emerged, particularly around students learning data science. After all, today’s students will help determine the future of diversity in this field.

 

Change Over Time

I first thought it would be interesting to see whether there had been any change in gender diversity among students over the three years of the Kaggle survey. I found that from 2017 to 2019, the proportion of student survey respondents identifying as female actually decreased just slightly. The number of respondents identifying as male and those offering other answers (which may have included “prefer not to respond” or “prefer to self-describe”) held steady. There hasn’t been a shift toward greater equity among genders, at least in these data.

 

students by gender 2017-19.png

 

 

Gender and Nationality

Which countries are producing the most female data science students? I looked at just the 2019 Kaggle responses and broke them down by nationality and gender. I then joined those data with countries’ population data and calculated how many data science students each country had per 10,000 people in its population -- and how many were female. (Of course, these are just Kaggle survey respondents, presenting complications I’ll discuss more below.) 

 

The 15 countries with the highest number of data science students are included in the chart below, with the number of female students displayed as well. Even in the countries with the largest numbers of surveyed students, the proportion of women is relatively low.

 

 

ds students by country.png

  

 

Gender and Education of Survey Respondents

Students of data science can pursue a variety of credentials. Again using the 2019 survey dataset, I examined differences in education levels among non-student respondents by gender. A slightly higher proportion of female respondents had either master’s degrees or doctoral degrees than did male respondents. 

 

Though the difference isn’t enormous, it’s interesting to consider the potential reasons for it. Are women simply achieving advanced degrees at higher rates than men? Are women with higher degrees more likely to complete a survey, for whatever reason? Are women held to a higher educational standard by hiring managers in order to obtain their positions? 

 

 

2019 education by gender.png

 

 

Data Limitations

Using Kaggle data to look at these issues isn’t an ideal approach, but it raises some interesting questions in itself. This survey was voluntary and (as far as I can tell) was provided only in English, even though respondents came from 171 different countries, so participation was limited. Additionally, Kaggle is in large part a competition website, where users respond to various challenges to prove their data mettle. That format may not be equally inviting to data science students and professionals of all genders and backgrounds. 

 

Asking survey respondents about gender is itself difficult, and my own analysis here is flawed because “other responses” (as I grouped the “prefer not to say” and “prefer to self-describe” responses) includes all other responses to the gender question on the survey. Additionally, answer options for that question differed on the 2017 version of the survey.

 

 

Takeaways for the Future

Other data from national and global educational and professional institutions would offer additional insights, and perhaps more reliable ones. But as we explore paths to achieve greater diversity in the data professions, it’s interesting to observe these patterns and consider how to address them.

 

For those seeking a path into the data professions, the Alteryx ADAPT (Advancing Data & Analytic Potential Together) Program is a completely free online training opportunity that includes a software license for use in the program, collaborative discussions, data science resources, and certification. The program is available to anyone whose employment has been affected by the COVID-19 pandemic, including people who are unemployed or furloughed, or who have lost internship or post-graduation opportunities. Topics include an introduction to data fundamentals, Core Certification for Alteryx Designer, and predictive analytics for business. The program’s self-paced structure is well suited for people from varied backgrounds -- including those who may have found it challenging to pursue other upskilling opportunities. 

 

It will take all of us working together and supporting one another to improve diversity in the data professions. For more insights and firsthand experiences on how we can support each other’s efforts in this area, be sure to check out this week’s Alter Everything podcast episode.

 

 

Susan Currie Sivek
Senior Data Science Journalist

Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek

Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek