Data Science

Garabujo7 · ‎08-22-2022

Democratization of analytics

Source: GIPHY

The democratization of analytics, or what we at Alteryx have defined as Analytics for everyone, means that anyone, regardless of the area in which they work or even if their professional training is not aligned with technology or data science, can take advantage of the benefits of analytics and process automation.

As we see in the following graph from @Datavizzdom, the activities of a data scientist go beyond the creation of predictive models, and it is in those other tasks (data cleaning and exploration) that consume most of the time that the platform of Alteryx shines.

Source: Twitter

That's why the Alteryx platform offers solutions focused on self-service and productivity. We remove the complex, closed, proprietary systems to make it easy for anyone who has a business question to answer it without becoming a systems or analytics expert.

The idea is that users can more quickly solve the business challenges they face on a daily basis independently and with an easy-to-use, replicable and powerful platform.

So, they take advantage of the experience they have in the business without spending too much time learning a new technology or programming language while they begin to improve their processes, discover new relevant insights, and make better decisions based on data.

Within all the possibilities that Alteryx offers, I will focus on the Machine Learning part this time.

Alteryx Machine Learning

In Alteryx, it is possible to create predictive models in several different ways; the main difference between them is the level of automation and options available for creating the models.

First, we have the predictive palette tools, for which we need to prepare the data, investigate it and select the appropriate predictor variables manually, without wizards to give us suggestions.

Next is the Intelligence Suite add-on that features assisted modeling, which walks us through the entire process after selecting the target variable.

If you want to see more details about this functionality, you can read these articles on assisted modeling that I wrote.

The third, and the one I will talk about in this article, is Alteryx Machine Learning, a cloud platform that allows us to easily create and evaluate predictive models and focus more on the results and their business application than on worrying about the entire process of creation and implementation.

The beginning: Get the data

A substantial part of the work of creating a predictive model is consumed in preparing the data.

This process includes, among others:

Access to data from various sources, files, and applications
Clean it, standardize it
Format it
Join it (no need-to-know SQL)

This is where our platform begins to show its potential and ability to do it all in one solution.

I will not go into the details of the well-known Alteryx Designer and its capabilities. I will only highlight its integration so we can use any data in the Alteryx Machine Learning platform.

Integration with Alteryx Designer

After preparing the data in Alteryx Designer, to upload it to the Alteryx Machine Learning platform, we need a data input.

For this example, I will use demand data for hotel reservations. The objective will be to predict which reservations will be canceled and what actions we can take to prevent or anticipate them to mitigate the risk of economic losses for the hotel and to better plan customer demand.

With the Machine Learning Send tool (included), we quickly upload the data to the platform.

The Alteryx Machine Learning Platform

The solution is geared towards the productivity of citizen data scientists.

Most of the process is wizard-driven to simplify many of the iterative and repetitive tasks we have to take to create a good predictive model.

As we know, predictive modeling is a process that combines art and science. Decisions of who creates the model, combined with experience in the business and knowledge of models and statistical techniques. That is why the development can take too long and become enormously complicated.

Instead, what the Alteryx platform offers is to automate that part of the process, only selecting some parameters and leaving more time to analyze the results, understand it, justify it, apply them to the business, and, very importantly, be able to explain it to people.

The development of the model is divided into five steps:

Data Preparation
Findings in the Data
Auto Modeling
Model Evaluation
Export and Predict

Going back to the rest of the platform, I would add a sixth step to implement the model in production, either through the web interface or by exposing a rest API so that it can be consumed by third parties.

Contextual Help

Before getting into the matter, an important part of using Alteryx Machine Learning is knowing how to get help.

Throughout the process, the platform offers us contextual help that we can be easily consulted to understand any step we are taking.

For example, when configuring Auto Model, we can click on the information symbol, and it shows us the explanation of the step we want to select along with a recommendation for use.

In addition to that, if we click on the little book that is in the upper right part of the screen, we can access the education mode. There we can find explanations of all the elements of the platform.

With this functionality, we can understand what the solution does. If we do not know about the metrics and processes it performs, it will be useful to learn more about data science while creating predictive models to solve our business challenges.

Screen Shot 2022-08-19 at 11.34.14 AM.png

The advantage is that if we don't use this feature, we can disable it at any time.

Prep Data

I mentioned at the beginning that some of the data preparation could be done in Designer, especially creating the base table on which we will build the model. In Alteryx ML, we can explore the data to better understand it before creating the predictive models.

First, we have the data, with an option to view the profile, data type, number of rows and columns, as well as the general quality of the data.

If we find an error, the platform notifies us, and we can correct it. For example, we have the ID field, which is not useful for building the predictive model.

Because of that, it shows us a message, and if we click on view details, we can review them at the bottom of the screen.

Here it shows us the finding and the recommended action to take.

By clicking on fix the data, we select the column that we want to clean, and the option to discard it from our analysis appears.

Data Health

After exploring our data set, we can review the data health. Focus on missing values in rows, columns, and outliers.

This data set, for example, has no missing values in rows or columns.

However, we have room for improvement in the distribution per column and the 83% of columns that have outliers or out-of-range values. Because of this, the data health is rated C.

This information is useful because the distribution and outliers can negatively affect the output of our model.

These are part of the tasks in which we have to decide and do a multitude of tests to obtain good results from our models; the good thing is that the Alteryx ML platform does that part for us automatically to make the best use of the data.

Source: GIPHY

Findings in the Data

In the next stage:

We will select the target variable
The machine learning method we will use
We will check the correlation in the data with a matrix or a chord diagram
We will explore outliers
The distribution of the target variable

Conclusion

In this first part, we reviewed the beginning of the process of creating a machine learning model.

Get the Data
Integration with Alteryx Designer
Prepare Data
Data Health
Findings

In the next part, we will see:

Target Variable Selection
Machine Learning Methods
Correlations
Atypical Values
Objective Variable
Model Training
Metrics for Model Evaluation
Feature Engineering

Don't miss the second part of this series.

Read part 2 here.