We’ve extended Inspire Early Bird Pricing until March 1. Register now and enjoy 20% off conference passes and 10% off training passes. P.S. Don’t forget to bring friends! When you sign up for five or more tickets, you get an extra 20% discount on conference passes. Learn more now.

Data Science

Machine learning & data science for beginners and experts alike.

In this last part, we will focus on understanding the model's results to explain it and put it into production so that it starts generating value immediately.


In case you missed them, you can read part 1 and part 2 here.


Auto Modeling


Once we select the parameters of our models, we train them. In my case, I’ve selected an ensemble of models. Once they were estimated, we went to the Auto Model section to see their general results.


The first thing we check is the list of models and their position according to the metrics selected.


The platform has ten machine learning techniques available to generate the models. This is a huge advantage because we are not dependent on just one model. In addition, it will search for the best combination of variables and parameters that gives the best results automatically. This way, we will focus on analyzing the results instead of working on creating and configuring the models.


Source: GIPHY


These are some of the models it trained with the selected metrics.




The model that the platform recommends is the following:




If we want to change the evaluation metric, we can do it, and the ranking of the models will change.




Model Evaluation


Now, when we go to the Model Evaluation, the most relevant part of the process, we can see the general results.



Here we are presented with the model's performance in the cross-validation, the holdout data set, and the sample size used to train the model.




It also presents us with the analytical pipeline used to create each model and all the steps followed.


This is where we can see that the model automatically performs the imputation of the variables. Depending on the data set, it can perform other actions automatically, such as treating outliers or unbalanced target variables.





Here we begin to review the metrics. We can see those of each of the models that were created. It shows us the result of the cross-validation and the retained data.




The confusion matrix presents the model's ability to correctly predict each category and the number of errors it made: in a count and as a percentage of the total.




In the ROC curve, we see the difference between a random choice and the value that the model would bring.


In this case, it gives us an area under the curve (AUC) of 93%, which is a good result together with that of the confusion matrix, which shows that the model effectively predicts the target variable without memorizing data or under-fitting.






Within the insights part, we can see the importance of the characteristics both in a list with the main ones and in a graph.


Here we also find artificial variables created automatically by the platform with the Feature Engineering process.


In this case, the numbers 3, 10, and 16 are artificial. The first is the average price per room, the natural logarithm of the days in advance when the reservation was made, and the natural logarithm of the number of adults on the reservation.






To better understand each variable's influence on the result, we use the partial dependence plot.


We select each of the most important variables and analyze the graph to see how changing the values changes the result. Interestingly, we can see each of the characteristics independently and understand their influence.





To continue analyzing the results, the platform allows us to create what-if scenarios. Using some of the records or choosing one at random, we can see the influence that the changes in the main characteristics have on the result of the prediction.


In this way, we can make decisions regarding the data we have and seek to focus on specific details of some of them, according to their influence on the result.


We can compare the results according to the distribution of each of the variables to observe their possible effect. With this, we can easily think about implementing changes to capture information or the best way to use what we have available.



Export and Predict


Export the Visuals


Suppose you need to review in greater detail or discuss with the rest of your team the graphics of the platform. This part allows you to export them, either in PowerPoint, ready to present, or as a set of images in a Zip file.




New Data to Predict


To create predictions for new data using the trained model, we can do it directly on the platform. Either by uploading a file or using one that is in the asset manager.




Integration with Designer


The last step is the export of the trained champion model to be able to deploy it in production through Alteryx Designer or Alteryx Server so that it can be scheduled and called through APIs.






In this article, we reviewed a part of the Alteryx cloud platform, which allows us to easily create machine learning models to solve business challenges in a systematic way—focusing on the results rather than the process and details of creating them.


Not only can we create models, but the integration of the Alteryx platform also allows us to prepare, unify and clean the data, in addition to qualifying and putting them into production to be executed automatically or through API calls with the Alteryx server.


As far as the business challenge that we use, the benefit would be that now, being a user with no data science background but with extensive business knowledge, I can capitalize on it and create a model that helps identify the customers that will keep your reservations to reduce costs and be able to plan ahead for demand.


An additional benefit is that using the most important characteristics identified by the model, plus their partial dependence and the simulations, it is possible to design new strategies to attract customers or create promotions aligned with these variables.



Feature Importance



Partial Dependency of Variables





With these results and without using code or programming, within the same platform, we will be able to put the model into production to start seeing business benefits quickly, accelerating the time it takes to generate value with the solution.


It was a long journey, but I wanted to show you this new functionality with a real example end-to-end, so that you know the process, the ease of use, and, above all, the orientation towards business results rather than user specialization.


Source: GIPHY