Author: Michael Barone, Data Scientist
Company: Paychex Inc
Awards Category: Best Use of Predictive
Describe the problem you needed to solve
Each month, we run two-dozen predictive models on our client base (600,000 clients). These models include various up-sell, cross-sell, retention, and credit risk models. For each model, we generally group clients into various buckets that identify how likely they are to buy a product/leave us/default on payment, etc. Getting these results into the hands of the end-users who will then make decisions is an arduous task, as there are many different end-users, and each end-user can have specific criteria they are focused on (clients in a certain zone, clients with a certain number of employees, clients in a certain industry, etc.).
Describe the working solution
I have a prototype app deployed via Alteryx Server that allows the end-user to “self-service” their modeling and client criteria needs. This is not in Production as of yet, but potentially provides great accessibility to the end-user without the need of a “go-between” (my department) to filter and distribute massive client lists.
Step 1: ETL
This results in several YXDBs that are used in the models. Not all YXDBs are used in all models. This creates a central repository for all YXDBs, from which each specific model can pull in what is needed.
Once all the YXDBs and CYDBs are created, we then run our models. Here is just one of our 24 models:
The individual model scores are stored in CYDB format, to make the app run fast (since the end-user will be querying against millions and millions of records). Client information is also stored in this format, for this same reason.
Step 2: App
Step 3: Gallery
And then they can select the various client criteria:
Once done running (takes anywhere between 10 – 30 seconds), they can download their results to CSV:
Describe the benefits you have achieved
Not having to send out two dozen lists to the end-users, and the end users not having to wait for me to send them (can get them on their own). More efficient and streamlined giving them a self-service tool.
Author: Mandy Luo, Chief Actuary and Head of Data Analytics
Company: ReMark International
Awards Category: Best Use of Predictive
As a trained Statistician, I understand why "70% data, 30% model" is not an exaggeration. Therefore, before applying any regression models, I always make sure that input data are fully reviewed and understood. I use various data preparation tools to explore, filter, select, sample or join up data sources. I also utilize the data investigation tools to conduct or validate any statistical evaluation. Next, I would usually choose 3-5 predictive modeling candidates depending on the modeling objective and data size. I often include one machine learning methods in order to at least benchmark other models. After the modeling candidates finish running, I would select the best model based on both art (whether the coefficients look reasonable based on my understanding of the data and business) and science (statistical criteria's like the goodness of fit, P-value and cumulative lift etc.). I am also often using the render function for model presentation and scoring/sorting function for model validation and application.
Describe the problem you needed to solve
ReMark is not only an early adopter in predictive modeling for life insurance, but also a true action taker on customer centricity by focusing on customer lifetime analytics (instead of focusing on 'buying' only). In this context, we need to 'join up' our predictive models on customer response, conversion and lapse in order to understand the most powerful predictors that drive customer activities across pre and post sales cycle. We believe the industry understand that it is insufficient to only focus on any single customer activity, but is still exploring how this can be improved through modeling and analytics, which is where we can add value.
Describe the working solution
Our working solution goes with the following steps:
Describe the benefits you have achieved