Survival Analysis, Part 1: Introduction (You are here)
Survival Analysis, Part 2: Key Models
Welcome to this four-part blog series, where we introduce a powerful analytical tool called Survival Analysis. In this series, I will provide a beginner-friendly guide to help you understand this popular statistical method.
In the first part, I will introduce the key concepts of survival analysis and show you some use cases where it can be applied. In the second part, we will dive deeper into the key models in survival analysis. In the third part of the series, we will walk you through how you can perform survival analysis using Alteryx. Finally, we will see how we can perform survival analysis using Python.
Whether you are a marketing analyst, medical researcher, engineer, or social scientist, this series will help you understand how to analyze time-to-event data and predict survivability. So, let’s dive in!
Survival analysis – the statistical method that answers the all-important question: “How long until it happens?” It was originally developed in the medical industry to predict the time until patient death (hence the name survival analysis). On the lighter side, nowadays, survival analysis is widely used in engineering, social sciences, and marketing analytics to predict what percentage of a group experiences a specific event as time or to compare time to an event in different groups.
I first encountered survival analysis when analyzing customer churn data, so why don’t we use churn to help us understand the topic?
In the case of customer churn, a survival analysis for a children’s clothing boutique can be visually represented in a graph like the one below. The x-axis of the graph represents the time elapsed since the customer's first visit to the store, while the y-axis shows the percentage of returning customers who continue to shop at the store (known as the retention rate). Each time point on the x-axis shows the percentage of the original customer population still active at the store.
Survival analysis also allows the comparison of multiple groups in the same chart, with each group being represented by its own line. Here, you can see the percentage of customers who continued shopping at the store among the groups who received weekly coupons and those who did not.
“Why do we even need survival analysis when we have machine learning?”
“Can’t we use classification models to predict churn?”
Yes, I hear you, and I agree when it comes to predicting customer churn, it’s easy to get caught up in the hype of modern machine learning tools. And, let’s face it, who can resist the temptation of high accuracy rates? However, there is at least one area where machine learning-based classification models fall short, and that’s predicting when churn will occur. This is where survival analysis truly shines, and knowing the “when” is really valuable to businesses:
Censorship, in the context of survival analysis, refers to losing track of an instance (in our case, that would be a customer) during an observation period or where the event (churn) has not been observed for a customer during this period. This is an important concept because if we don’t consider censorship, we will potentially introduce bias into our prediction — just because we haven’t observed a customer canceling a subscription doesn’t mean they never will. More specifically, there are three types of censorship:
Survival analysis endows us with the ability to analyze time-to-event data on a wide range of topics. Literally, we can apply survival analysis to predict any event of interest that happens over time, where we can define a clear start and an end. Some of the common use cases include:
Stay tuned for the next articles in this series--you can subscribe to the data science blog to make sure you don't miss any!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.