Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Data Science

Machine learning & data science for beginners and experts alike.
SteveBrown-Alteryx
Alteryx Alumni (Retired)

Healthcare providers deal with numerous insurers, but many providers don’t have a sound understanding of their payment streams from insurers.  This leaves them at a disadvantage when attempting to forecast future revenues and unprepared when negotiating future contracts with insurers. 

 

More specifically, healthcare providers need to know what each insurer is paying for the various services they provide so they can predict future revenue from insurers based on what they’ve billed them.   

 

And when it comes to negotiating contracts with insurers, according to Tracy Watrous of the Medical Group Management Association, “To start this process, provider organizations should start to analyze fee schedules and payment processes to determine the performance of each payer. Providers should focus on the rates for the organization’s most commonly billed services. [1] 

 

Alteryx Machine Learning can help healthcare finance professionals employ the power of machine learning to better understand their insurer revenue streams and make predictions about future revenue.  They can also use the knowledge they gain to better position themselves for negotiations when insurer contracts are up for renewal. 

 

Use Case 

 

Jonah is a financial data analyst for a mid-sized hospital.  His leadership team has asked him to forecast insurer payments for recently billed, but as yet unpaid procedures.  They also would like him to

assess the payout performance of the insurers relative to each other.   

 

He has assembled data on the procedures his hospital has billed insurers for and what the insurers paid.  He’ll use Alteryx Machine Learning to gain insights about the data and build a machine learning model to predict insurer payments.  He also wants to determine which insurers are high-payers and which are low-payers.  His data, a copy of which is attached, contains the following information: 

  • Paid_Insurance – what the insurer paid for a procedure 
  • Patient_Pay – what amount the patient paid for a procedure 
  • Charge_Insurance – what the hospital billed the insurer 
  • Diagnosis_Code – an identifier for the procedure performed 

 

Data Preparation and Exploration 

 

Jonah creates a new Project in Alteryx Machine Learning and loads his data.  On the Problem Setup screen, he turns on data profiling to get an understanding of the distribution of values in each column of his data and which data types AYX ML has inferred, e.g., numeric vs. categorical, etc.  Since he is using machine learning to model how much an insurer paid, ‘Paid_Insurance’ is the target variable he will use during the modeling process, so he selects that.  Alteryx Machine Learning recommends using a Regression machine learning method, which makes sense to Jonah because his target variable is numeric. 

 

image001.png

 

He studies the data distribution of each column and notices that DC_48 is the most common diagnosis code and that insurer (Payor) P_A is the most common payor.  He will pay close attention to those as he proceeds with his project.  He decides not to drop any columns, since all are relevant to his problem.  Also, he observes that the data types Alteryx Machine Learning inferred (Double, Double, Double, Categorical, Categorical) for his columns is correct for his business context. 

 

Data Insights 

 

Jonah proceeds to the Data Insights panel to learn how his data is correlated and whether there are problematic outliers in his data.  He sees that there is a high correlation (.94) between Charge_Insurance and Paid_Insurance, which does not surprise him.  But, he’s more interested in the relationships between Diagnosis_Code and his target Paid_Insurance as well as that of Payor and Paid_Insurance.  He’s glad to see that there are correlations between them, and he looks forward to learning more about it during the modeling process. 

 

image002.png

  

Next, he checks for Outliers.  Most of the identified outliers are legitimate values in his business context except one, the ‘Charge_Insurance’ value in row 2748.  Compared to other rows with Diagnosis_Code ‘DC_50’, the value is way out of line with his company’s acceptable range for that procedure, so he drops that row, as it will not be beneficial to his model if he includes non-representative data. 

 

image003.png

 

Model Setup 

 

Jonah opens the Model Setup panel.  Since he is not a machine learning expert, he chooses to accept the default settings, and observes that the default value for the holdout set is 20% of the original dataset.  The holdout set will be used to evaluate model performance later in the project. 

 

Feature Engineering 

 

Jonah proceeds to the Primitives tab to see if there are potential signals in his data that might be uncovered through feature engineering.  Feature engineering is the process of using domain knowledge to discover new features (characteristics, properties, attributes) from raw data. The motivation is to use the new features to improve the quality of results from a machine learning process versus supplying only the raw data.  Primitives are data operations that are used to create new features.  He views the primitives available. Since he wants to see how his raw data is modeled, he decides not to select any primitives, and to come back to this panel later if his modeling results are not satisfactory to him. 

 

image004.png

 

Modeling 

 

Jonah kicks off the auto modeling process by clicking ‘Next.’  Alteryx Machine Learning then runs a suite of modeling algorithms to find the best one and provides results on the Leaderboard.   The Random Forest Regressor model was the best performing based on the R-Squared(R2) metric.  He clicks on ‘Learn More’ to educate himself on the various ranking metrics. The Random Forest Regressor model also performed well on most other regression-specific metrics like Mean Squared Error(MSE), and Explained Variance.  This increases his confidence that his model will perform well. 

 

image005.png

 

He clicks 'Next’ to apply holdout data to see how the Random Forest Regressor performs with it.  He opens the Performance tab and observes the performance of the model against the holdout set versus that obtained against cross-validation data during the AutoModel step.  The model performs well versus the holdout set, even better for most metrics than it did during cross-validation.  If he uses this model with new data of a similar profile to the data he created the model with, the error range of his predictions should be consistent with what his metrics indicate.   

 

image006.png

 

He moves to the Insights tab to see the factors that are influencing Paid_Insurance.  Unsurprisingly, Charge_Insurance is the most important feature.  He is also very interested in Payor, which registers as an important variable, and it represents something his organization can influence during contract negotiations. 

 

image007.png

 

Digging deeper, he looks at the partial dependence of his target, Paid_Insurance, on the other columns.  He notes that for Charge_Insurance, the insurer payout seems to level off as Charge_Insurance approaches $4000.  For Payor, he sees that insurer P_D pays out significantly more than the other four.  He also sees that the insurer P_A pays out significantly less.  Patient_Pay and Diagnosis_Code are not strong indicators, so he does not pay further consideration to them. 

 

image009.png

 

image008.png 

Export and Predict 

 

Since Jonah now has a model he can use to make predictions about expected insurer payments, he uploads new data for prediction by importing a dataset containing records of patient paid amount, billed charges, diagnosis code, and insurer billed.  Since making a prediction requires having at least the same columns in the incoming prediction data set as were used during modeling, he ensures that his column headers are Patient_Pay, Charge_Insurance, Diagnosis_Code, and Payor respectively, so Alteryx Machine Learning can recognize them and match them to the model in order to make a prediction.   

 

image010.png

 

He downloads the results and uses them to prepare a revenue forecast for his management. 

 

Further, he writes a report calling out the relative payout levels for the five insurers his hospital deals with.  He is confident that that his management team will be pleased to be armed with that information heading into contract negotiations.   

 

Summary 

 

Jonah used Alteryx Machine Learning to gain actionable insights quickly and easily and without having prior knowledge of the intricacies of machine learning.  He has created a model his hospital can use to predict future insurer revenue streams, and he’s gained important insights about the insurer’s payment levels that will help during contract negotiations, especially with insurer P_A.   

 

Sources 

  1. https://revcycleintelligence.com/features/maximizing-provider-revenue-with-payer-contract-management