Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Data Science

Machine learning & data science for beginners and experts alike.
SydneyF
Alteryx Alumni (Retired)

Blog_CDSandCS_Banner_1042x250_Updated-01.jpg

 

As long as there has been scientific research, there have been interested enthusiasts. Over the past century, science had moved from the realm of self-funded “gentleman scientists” (now called an independent scientist, which is probably better, but sounds less fun) to institutional professional scientists.

 

Although having someone else bankroll your research typically makes for a more stable experience, it does take away some of the freedom of doing research on what you want. Making professional research the norm also added a barrier of entry to the scientific inquiry. To do science, you now had to be a professional academic with the associated education and credentials. Science, in this paradigm, is a fulltime engagement, and not something for the general public to be a part of as a part-time interest or hobby. Think of this transition as the formal birthing of the ivory tower.

 

Enter citizen science. First coined in the 1990s, and given an official definition in the Oxford dictionary in 2014, citizen science is the participation of the general public in scientific research often in collaboration with or under the direction of a scientific institution or professional scientist. Citizen science leverages the strength of the public to identify research questions, collect and analyze data, and interpret results. A citizen scientist is someone who engages and contributes to scientific work – not necessarily with a scientific background.

 

I wasn’t there, but when the people at Gartner coined the term “Citizen Data Science”, I have to imagine they were thinking about the citizen science movement. It makes sense – there are a number of parallels between a citizen scientist and citizen data scientist.

 

Both citizen scientists and citizen data scientists participate in technical work while not necessarily coming from a technical background, both movements are a response to a need to democratize something valuable but potentially esoteric, and both movements result in mutually positive outcomes for citizens and professionals alike.

 

Citizen data science is a hot topic, but it hasn’t been well defined. Here, I’d like to propose a definition for the citizen data scientist that parallels that of a citizen scientist and examine the citizen data science movement through the lens of citizen science.

 

Let’s start with the definition. Citizen data scientists are people that leverage tools, methods, and best practices developed by data scientists to analyze data without having an extensive background (professional or academic) in mathematics or computer science. Yes, I’ve taken a stand and defined it. Please don’t throw things at me over the internet.

 

Drivers of Citizen (Data) Science

 

Citizen science has become more and more widespread in academic research because of a few different phenomena; both social and technological.

 

Social drivers for the rise of citizen science include increased demands for transparency in scientific processes, and a desire to be involved with how research and subsequent political policy is conducted and enacted.

 

Technology like the internet and smartphones have made us more connected than ever before, with more access to low-cost sensor technologies often used in scientific research. These tools allow people to connect with projects and gather data more easily than ever before. In parallel with this deluge of technology and data, we, as a society, are also more interested and hungrier for data than ever before.

 

It’s kind of like a positive feedback loop (think: snowball rolling down a hill). The more technology is able to capture new data, the more data we want to capture and examine. Gathering data has always been an important part of the scientific method, and now mechanisms to capture data are more accessible and widespread than ever before – to the point where it has seeped over into everyday life. We now carry around a computer that acts as a two-way data device – while we search for information on the internet or connect with people around us, it captures that information to be sold and leveraged by marketing departments.

 

This existence of and interest in the abundant data that now surrounds us is a major factor that has driven the emergence of Data Science. As we capture more data (both larger quantities and new phenomena) there is a need to analyze and leverage the information that data contains. Data science is a field that has risen out of industry rather than academia.

 

Due to its origins, the citizen data science movement has slightly different flavors of social drivers than citizen science. This is the part you’ve probably read about citizen data scientists before- the demand for professional data scientists and people who can analyze data to derive value has surpassed what the market for highly trained data scientists is able to provide, resulting in a demand for (drumroll) citizen data scientists.

 

Blog_CDSandCS_Separate_200x200-01.jpg

 

The citizen data scientist is someone who knows enough about predictive modeling and data analysis to be dangerous. They can implement processes and procedures defined by the Ph.D. data scientists with Python and R packages or with GUI tools (like Alteryx!). With a combination of whatever background they come from, data literacy, and a drive to learn and understand different modeling techniques, they can be effective at deriving new insights and doing so at a meaningful scale. Their backgrounds outside of math and computer science can also grant them different skills that make their contributions meaningful – domain expertise, strong communication, and presentation skills, creative thinking, and leadership are all qualities that can make a citizen data scientist effective.

 

This is similar to how a citizen scientist works – they are able to gather or analyze scientific data by following a protocol developed by professional scientists so data can be gathered, combined, and analyzed while assuring high-quality outcomes. There are concerns about the veracity of citizen science, but these are often combatted by well-established procedure and feedback loops, the scale at which data can be verified against multiple citizen scientists, and involvement and shepherding from a professional scientist coordinating the research.

 

Citizen scientists are volunteers. Citizen data scientists are also volunteers, but not necessarily in the same sense. Where a citizen scientist will contribute their free time to research, citizen data scientists often emerge in the workplace as curious self-starters who volunteer to take on the biggest data challenges their teams are faced with.

 

A citizen data scientist is the person asking for more data, new data, and different data. They are exploring the techniques and tools penned by data scientists.

 

The drive to learn about data science is what separates a citizen data scientist from an analyst, or someone curious about data. It’s about wanting to use advanced and complex analysis methods or machine learning models in the name of discovery. So much of data science is driven by an inquisitive, scientific mindset. This same mindset is what primes someone to become a citizen data scientist in their workplace.

 

Outcomes of Citizen (Data) Science

 

The goal and outcomes of citizen science are often as much social as they are scientific. A major benefit of citizen science, of course, is that scientists can gather data at a scale that was previously either time prohibitive, cost-prohibitive, or just impossible. Having thousands of “research assistants” in “the field” across the globe is a huge advantage for capturing data, and researchers have been able to pursue many different projects, ranging from ecological surveys to generating data points on air pollution at the street level in cities.

 

Beyond the massive benefit of scale that citizen science unlocks, there is also a major social benefit to public engagement. By engaging citizen scientists, professional researchers can make sure the questions they are asking are relevant and important to society. They can also add different perspectives in the analysis and interpretation phases. Engaging citizen scientists can also have implications on how political policy is written and passed – if you have citizens who are familiar with topics at a research-level (e.g., climate change) you develop an informed populace that can make informed voting choices, or advocate for themselves and their community by reaching out to lawmakers.  

 

The educational opportunity of citizen science enables people to get exposure to both the specific topic being researched, as well as gaining general scientific literacy. This scientific literacy is one of the most significant parallels between the outcomes of citizen science and citizen data science.

 

We live in interesting times. More and more processes are being automated and handed over to AI. The more data and AI-driven decisions we make, and decisions that are made for us, the more there needs to be a certain level of data literacy, algorithm literacy, and computer literacy for our populace. Citizen data science, in addition to being a way to level up a career, is a pathway to gain data and computer literacy, which will become more and more necessary as our society becomes more and more digitized.

 

The Future for a Citizen Data Scientist

 

I think it is important to highlight here that the democratization of science or data science doesn’t result in a decreased need for professional scientists or data scientists –  we still need mentors, people exploring the advanced questions, and developing the procedures and algorithms that are implemented by the citizens. Citizen scientists and citizen data scientists offer a unique and meaningful perspective to scientific analysis, and it is important that their voices are incorporated in decision making. Citizen scientists also increase the overall data fluency of an organization, which makes the job of a data scientist easier by enabling more clear communication between different SMEs and stakeholders.

 

Blog_CDSandCS_Separate_200x200-02.jpg

 

I think it is also important to highlight here that citizen data scientist-ship does not necessary mandate a career path to becoming a data scientist. It is totally reasonable and good to engage with analytics and data science at the citizen level and add these skills to your tool kit on your career path. You might want to be a product manager, C-suite executive, or marketing expert that has the literacy necessary to engage with data scientists as well as SMEs – citizen data science is an avenue to achieve just that. I would even argue that citizen data scientists should and will become more common in both the professional and personal sphere as data continues to become a bigger part of our lives.

 

That being said, a citizen data scientist can become a professional scientist/data scientist – it just becomes about the drive to gain the necessary knowledge to become the person contributing to the tools, methods, and best practices that make up the field of data science. Your journey to that point might be enrolling in MOOCs, going back to school, or just gaining and leveraging experience in data-science-adjacent professions.

 

Getting involved as a Citizen Scientist

 

If this article inspired you to look into getting involved with citizen science, then I did my job right. There is an article on the Ten Principles of Citizen Science published by the European Citizen Science Association that might be an interesting read for context, and here is a (non-exhaustive) list of opportunities to get involved in.

 

The Environmental Protection Agency (EPA), the National Park Service, and the Smithsonian Institution are a few different governmental organizations that have formally integrated citizen science into their research programs.

 

Scistarter is an online community to find, join, and contribute to science formal and informal research projects and events. Scientific American also posts listings of active citizen science projects.

 

iNaturalist is a phone app you can download that allows you to identify the plants and animals around you. In the process, users of the app create high-quality data for scientists to leverage for research.

 

Getting Involved as a Citizen Data Scientist

 

If this article inspired you to look into getting involved with citizen data science, then I did my job right again. Boom! Nailed it! I’m taking tomorrow off. 😊

 

It all starts with a drive to learn, and there are countless resources scattered across the interwebs waiting for you. Check out the blog post Free Resources for Learning Data Science as a starting point. Alteryx also hosts an ongoing blog series, featuring some of the most important and fundamental topics for emerging citizen data scientists to learn about.

 

When you’re ready, think about writing for the Alteryx Data Science Blog. Writing is a great way to practice teaching your knowledge (which is the very best way to learn). It doesn’t have to be anything earth-shattering. Think about writing for yourself in the recent past. Anything you learned along the way will probably be helpful for someone walking down the same path. If you need further convincing, check out this article from Rachel Thomas of fast.ai.

 

Other advice – if you’re hoping to transition from citizen data scientist to data scientist, consider getting a data scientist adjacent job, like an analyst or data engineer, and working forward from there. As data science has become more popular, many people are flooding into the field from a wide variety of backgrounds, typically through MOOCs and Kaggle competitions. Taking a side-door into the industry will enable you to gain valuable industry experience while training up to meet your end goal.

 

The world needs more analytical thinkers and people who are fluent in data and science. That is what is really beautiful about both citizen science and citizen data science. It is an open invitation for anyone (including you) to get involved and educated in something they think is interesting and important. All it takes is you opting into questioning, exploring, learning, and experimenting.

Sydney Firmin

A geographer by training and a data geek at heart, Sydney joined the Alteryx team as a Customer Support Engineer in 2017. She strongly believes that data and knowledge are most valuable when they can be clearly communicated and understood. She currently manages a team of data scientists that bring new innovations to the Alteryx Platform.

A geographer by training and a data geek at heart, Sydney joined the Alteryx team as a Customer Support Engineer in 2017. She strongly believes that data and knowledge are most valuable when they can be clearly communicated and understood. She currently manages a team of data scientists that bring new innovations to the Alteryx Platform.

Comments
LDuane
Alteryx
Alteryx

Great work Sydney!!!  Thanks for defining CDS!!