This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Check Out Alteryx Machine Learning In Action!Watch Video
-Using the attached dataset (dataset.csv) analyze the data and visualize the most important aspects using your preferred method.
-Furthermore, share three ideas on how to decrease the churn rate. Document steps where needed.
-Split the data into train and test sets. Predict whether a shop will churn or not.
Please document your steps and method used. The csv,“sample_submission_file_churn” will help with the format.
The objective of this case study is to create a machine learning model that will predict whether a customer will churn or not.
@mboroto_89 do you need to use a certain set of tools? Or are you open to ideas? A couple of options for handling this:
- Alteryx Machine learning Cloud (my preference given it can give you multiple models and rank them for you)
- Alteryx Intelligence Suite Assisted Modeling (similar to above but a designer desktop add on)
- Leverage core predictive tools, something like logistic regression
Your problem is a supervised classification problem. Some steps:
- Join your sample file to your data set file on Customer ID to ensure each row of the customers is mapped to a churned or not churned example
- Once joined we will look toward a classification method to solve for this
- high level youll want to split your data into a training and test set (80/20 or 70/30)
- Build your model in your preference of above methods (AYX ML cloud is the EASIEST for non data scientists + offers business explanations and feature importance)
- Score/predict the holdout/test set with the model
Theres obviously more to this but let me know if this helps
Thanks @AlteryxMarco , will follow the steps provided.
One thing am struggling with is whether to remove duplicate clients based on client ID. However given that this is a classification problem identifying clients likely to churn, there are multiple data points that represent different interactions, I suspect these may be valuable for predicting churn.
@mboroto_89 I would not remove duplicates on client ID (just don't use the ID itself as a feature to train the model). Like you said there are multiple interactions per customer, and it's likely ONE of those interactions if not multiple contributed to the actual churn. Example: Let's say I buy a single quantity of something and have no issues. Maybe I go back now and buy another 50 items in bulk, but I experience longer wait times for my order (which puts me off and I leave). If you were to remove one of those factors from the data then you lose that observation. This is kind of oversimplifying it, but the short of it is your customer ID is just your primary KEY ID and since you do have many observations per customer, you'll want to retain them so the model can learn from them.
Some other helpful tips - maybe explore feature engineering. One Example - is the delivery date a weekend? Weekday? Holiday? etc,, these could all play factors into how the products show up, when they show up, etc..
Hope this helps!
@AlteryxMarco many thanks, this helps a tone! i created a workflow for day of the week analysis for days with most orders and also days most orders are delivered, ofcouse for orders requested on a weekend have a longer TAT i.e 2 days given weekends are non working days esp Sunday.