Inspire EMEA 2022 On-Demand is live! Watch now, and be sure to save the date for Inspire 2023 in Las Vegas next May.

Data Science

Machine learning & data science for beginners and experts alike.

The in-flow of cash is the single largest factor that determines whether a business can continue to operate, but what do we do when our loyal customers pay late? We could hound them with invoices, extend them credit until they are able to pay, or we could simply do nothing and wait. With Alteryx Machine Learning (AYX ML), we provide a different path forward1. Harnessing the power of machine learning in an easy-to-use no code environment, we can now predict which of our customers are likely to pay late and by how many days they will be late. For the purposes of this blog, we are only going to focus on the first question, which accounts are likely to pay late or not.

 

Topic Expertise

 

When beginning to address any problem with machine learning, it is helpful to find experts that can provide you with the contextual details needed to address the problem at hand. With this idea in mind, we reached out to our colleagues in our Accounts Receivable department to begin my investigations into which customers here at Alteryx are likely to pay late and by how much. In our discussions, we were able to find out more about their current processes, as well as information about how many of our customers have outstanding balances at a given period of time, the average duration of time for an overdue balance, and what types of variables they look at when considering which customers to extend credit to prior to payment.

 

Data Acquisition

 

Before jumping into the analysis, let’s briefly discuss the data we used for the analysis. In addition to looking at which accounts were past due and the number of days past due for our customers, we also delved further into account-level information to build our machine learning models. The additional data points we collected from each customer include the following but are not limited to:

  • Account sales region (e.g., North America, LATAM, APAC, etc.)
  • Account sales segment (e.g., public sector, small business, etc.)
  • Account tenure
  • Number of active contracts with Alteryx
  • Industry (e.g., manufacturing, tech, etc.)
  • Global 2K membership
  • Month of past due collection
  • Amount overdue
  • Payment method (e.g., check, wire, etc.)

 

Based on my discussions with the Accounts Receivable colleagues, I thought this was a good place to start.

 

Data Preparation and Exploration

 

Before building the machine learning models, we wanted to assess the quality of data going into the model. Thankfully, AYX ML provides us with data health scores based upon factors such as outlier data, percent of our data that is null across rows and columns, and the distribution of values within our data features.

 

curtisburkhalter_0-1664400899459.png

 

As you can see, our data is of pretty good quality, and was ready to begin modeling using AYX ML.

 

In addition to an overall data quality score, AYX ML provides several exploratory data analysis features to help us understand the strength of the relationships between our data points as well as the direction. These features range from correlation matrices, bivariate relationship plots, and outlier plots.

 

curtisburkhalter_1-1664400899497.png

Figure 1. Correlation matrix of the various features used to build the model

 

curtisburkhalter_2-1664400899520.png

Figure 2. Bivariate plot showing the possible relationship between # of employees and whether an account was past due

 

curtisburkhalter_3-1664400899579.png

 Figure 3. Outlier plots for various features used to build machine learning models

Using these various exploratory data analysis tools, we were able to decide on whether certain variables should be dropped from further analysis due to the fact that they share a high proportion of mutual information, whether an outlier should be removed from our analysis, or whether or not we should be concerned with factors such as a class imbalance in our target variable.

 

In addition to working with original data, AYX ML provides the ability to do automated feature engineering. Feature engineering is typically a time-consuming process, but with the click of a few buttons, we were able to create a number of new features from our existing data to use in our predictive models. Not only did this save us a lot of time, but the feature engineering improved the overall performance of our models.

 

curtisburkhalter_4-1664400899660.png

Figure 4. Automated feature engineering interface in AYX ML

 

Predictive Modeling and Evaluation

 

Now that we had a good handle on the data we could then jump into the analysis. The great thing about AYX ML is that it is a fully automated machine learning solution. As such, it performed one-hot encoding of our categorical features, imputed missing values in our data, performed hyperparameter tuning, performed label encoding of our target variable, and built and scored multiple models. The output from the automated modeling shows the performance of the model as measured by our selected scoring metric, in this case, the mean F1-score, as well as the number of features used and the performance of our best model versus a baseline classification model.

 

For our more technical users, you can still customize some of the model configurations if you would like by adjusting factors such as the percentage of data used for holdout evaluation, the number of folds to use in cross-validation, and even one-click model ensembling, which will build an ensemble of all models built and average the predictions.

 

curtisburkhalter_5-1664400899731.png

Figure 5. Model output interface in AYX ML

 

We can see that our F1 score, which balances precision and recall in classification scenarios, was 0.28.  The performance of the best model, the XGBoost classifier, was almost 28% better than the baseline classifier.

 

If we delve deeper into the model performance by looking at our confusion matrix, we can see that the F1 score is highly impacted by false positives. We were able to flag accounts that are not likely to pay late, as indicated by our false negative ratio, but unfortunately, the model would flag accounts as likely to pay late at a slightly higher rate than we would probably like.

 

curtisburkhalter_6-1664400899760.png

Figure 6. Confusion matrix output in AYX ML

 

While having all these numbers is informative from a model evaluation standpoint, what we wanted to be able to do was provide actionable insights for our Accounts Receivable colleagues. To begin to do that, we should look at the feature importance in the model. This chart reveals that the features most important for predicting whether an account was likely to pay late or not include:

 

  • The ratio of active contracts to account tenure
    • ~10% increase in likelihood to pay late when ratio increases from 0 to 1, but effect levels off after that
    •  Older clients with fewer contracts pay on time more often

 

  • The number of active contracts
    • ~5% increase in likelihood to pay late going from 1 to 3 active contracts, but this effect begins to level off after that
    •  Fewer contracts mean paying on time more often

 

  • The ratio of Alteryx Designer seats in an account relative to the distinct number of products purchased by an account, which gives us an idea of the product mix purchased by a customer.
    • ~20% increase in likelihood to pay late going from a ratio of 1 to 7 designer seats: distinct SKUs, but flattens beyond that point
    •  Accounts with a greater product mix are more likely to pay on time

 

curtisburkhalter_7-1664400899813.png

Figure 7. Feature importance chart in AYX ML

 

curtisburkhalter_8-1664400899831.png

Figure 8. Relationship between active contracts: account tenure ratio and probability of paying late

 

curtisburkhalter_9-1664400899852.png

 Figure 9. Relationship between account tenure and probability of paying late

 

curtisburkhalter_10-1664400899870.png

Figure 10. Relationship between # of Alteryx Designer seat: distinct product purchased ratio and probability of paying late

 

To summarize, the relationships between our most important predictors and the likelihood to pay late provided actionable insights to our Accounts Receivable colleagues and we are in the process of using the model insights to derive new strategies for both account monitoring and overdue balance collection. Additionally, these model results could be further strengthened with additional data inputs and further discussions with our subject matter experts.

 

Finally, one of the greatest strengths of the product is on full display here. We were able to obtain results and insights in a relatively short period of time, going from initial conversations to MVP (minimum viable product) in about two weeks. AYX ML not only provides the ability to upskill an entire section of your workforce but is an accelerator for those team members that are already more technically proficient.  If you have any questions about the product, please reach out!

 

Resources:

  1. https://www.alteryx.com/products/alteryx-machine-learning