In late October of last year, Alteryx employees received a very intriguing email from Alteryx for Good (AFG). The Department of Health and Human Services (HHS) was sponsoring a codeathon aimed at incubating creative technical solutions to the opioid crisis. Alteryx would be a corporate sponsor of this event, and we would also be sending three teams of Alteryx associates and one team of ACEs to use their combined analytic badassery to help end this tragedy. AFG was soliciting volunteers from throughout the company to serve on these three teams. Wanting to use our data science and Alteryx skills for a positive impact, the five of us (@CristonS, @MacRo, @BrandonK, @BridgetT and @MattD) promptly volunteered. We didn’t know exactly what we were getting into, but we were eager to help.
HHS had three different focus tracks: prevention, usage, and treatment. We chose the usage track, and we aimed to assist various stakeholders in predicting who was at risk for opioid abuse.
What did we actually do?
At first, we struggled to settle on a single idea. After considering the breadth of datasets provided by the HHS, we initially aimed to create an auto-joining technical marvel in order to have a more comprehensive view of all of the datasets - now and in the future. We soon realized that this project, while very exciting, was not feasible within the twenty-four hour time span of the codeathon. Thus, we decided to pursue another route.
We ultimately chose to create an analytic app, hosted on the Alteryx Gallery, that doctors and other medical providers could use to estimate a patient’s risk of opioid addiction. The app required only four pieces of information: the patient’s address, whether the patient was a smoker, whether the patient regularly experienced mental distress, and whether the patient drank excessively. Though using more information (more predictive variables) would have given a slightly more accurate risk assessment, we wanted to make an app that was simple enough that busy doctors would still want to use it and not see it as burdensome. We hoped that doctors could use the results from this app to identify patients who might need extra services, such as pain management counseling and therapy, in order to prevent them from becoming addicted to the very medications intended to help them.
We spoke with a nurse practitioner at the codeathon and told her about our app idea. She loved it and thought it could be a very useful tool for prescribers, and we started talking about ways to enhance it. Most notably, by including patient medical histories in the model (including factors such as whether or not the patient had previously been prescribed opioids and in what amounts and frequencies, whether the patient had ever had substance-abuse treatments or an overdose in the past, etc.) our model would almost certainly see significant uplift. Such data are captured and stored, but not included in the datasets made available for this codeathon — one reason likely being the effort required to obtain and anonymize it. On the other hand, if we were to overcome such hurdles, it would certainly be a valuable step in developing more effective models to fight the opioid epidemic. Even so, using just the data provided, we were able to build a very compelling model for understanding patient risk of opioid addiction.
How did we do it?
With lots of caffeine, of course! We didn’t get very much sleep, after all…
In all seriousness, there were three phases of our solution: the data preparation, the predictive modeling, and the app creation.
The data preparation was the most time-consuming part; we honestly don’t know how we would have finished in twenty-four hours without using Alteryx! Simply going through all the data sources was quite time consuming - shout-out to Matt for taking care of that portion. Luckily, Brandon provisioned a VM for us all to use, so we could easily share our prepared and blended datasets. Throughout this portion, Criston used her experience with demographic data and Alteryx to guide us through the identification and preparation of the best datasets. We ultimately used the CDC Wonder dataset (it contained causes of death for deaths throughout the country), Medicare Part D Opioid Prescriber data, and a dataset from the University of Wisconsin that provided extensive county-level health information.
After we prepared the data, we could finally create the model, which was primarily Bridget's responsibility. Since the CDC Wonder dataset lacked sociological data (eg. smoker status, mental health) about the patients, we had to use county-level variables as predictors in our model. We initially built an XGBoost model, since XGBoost had been a very successful induction algorithm in many data science competitions recently. However, our app required a model with coefficients, so an XGBoost model, which is an ensemble of trees, was not suited to such an application. Thus, we ultimately used an elastic net model, which can be produced using the Linear Regression tool in Alteryx Designer. (Technical note for anyone interested in the details: An elastic net model is similar to an ordinary linear regression model, but it includes a penalty on the size of the coefficients, so it tends to produce models that are less over-fitted. Some parameter tuning is required to determine the precise nature of the penalty. However, the R implementation used in the Alteryx Designer includes an option to use internal cross-validation, which simplifies the parameter tuning considerably.)
Once we had the elastic net model, we chose the standardized coefficients that were highest in magnitude and partitioned the corresponding variables into two groups. One group consisted of county-level variables, such as the percentage of the population that drove alone to work, and the percentage of children who were eligible for free or reduced-price lunch. The app determined the values of these variables based on the patient’s address using the University of Wisconsin dataset. The other group contained individual-level attributes, such as whether the patient was a smoker, and whether the patient regularly experienced mental distress. The values of these variables were determined directly from the corresponding values provided to the app. Ultimately, the final model used to produce the risk score combined both sets of values using coefficients determined by the intermediate elastic net model.
Building a custom web interface with scoring powered by an Alteryx App on the Gallery
We wanted to build a light weight, clean interface that could be easily accessed and used on different devices. To accomplish this goal, we (mainly Mac) borrowed a much simplified version of the architecture used for the Alteryx Election 2016 App which leveraged the Alteryx Gallery API.
The Alteryx Gallery API allows you to build your own web pages (or any other application or script that can connect to REST APIs, such as Python or Ruby) designed to look exactly the way you want, while harnessing the power of Alteryx to do complex tasks. This is a neat trick that others have slowly started to notice — give it a try so you can impress your friends at parties! (Well, maybe not... unless that party is Inspire, in which case this is a sure way to impress your fellow party-goers!)
Having an example made setting this up relatively simple, and allowed us to build out the components to our liking within a few hours. It was one of the last things to come together, so having a precedent was a huge help as we burned through the early morning hours.
If there's enough interest, we'll make the code available and write up a walk-through in a future blog post, but in the meantime, here's an overview of the components used and how they fit together to give a better sense of how it works.