Data Science

Machine learning & data science for beginners and experts alike.
Garabujo7
Alteryx
Alteryx

In this, the second part, we will see how to:

  • Select the ML techniques
  • Understand its differences
  • Compare your results
  • Get some prediction explanations
  • Export the analytics pipeline and custom
  • Get the Python code created (automatically)
  • Model Scoring
  • Hyperparameter Tuning
  • Export the reports for team discussion

 

We will start where we left off in part 1.

 

Select Algorithms

 

The last step allows us to select the algorithms that we want to use for prediction, thus complying with the “there is no free lunch” data science theorem, which states that no algorithm is perfect for all cases; you have to try different ones to get the best results that adapt to the data and specific situation.

 

MeganDibble_0-1660673832850.png

 

For categorical variables, we have four algorithms available:

  • Logistic regression
  • Decision tree
  • Random Forest
  • XGBoost

 

If it is a continuous variable (numerical), we have three algorithms to choose from:

  • Linear regression
  • Decision tree
  • Random Forest

 

Each one has its definition, pros, cons, and practical cases where it is applied.

 

MeganDibble_1-1660673832865.png

 

We click on Run the selected algorithms to train them.

 

Model Comparison

 

Once the training of the selected models is concluded, the Assisted Modeling tool presents the global and individual results together with an explanation of the metrics and a recommendation of the best model according to its accuracy and processing time.

 

MeganDibble_2-1660673832931.png

 

In this case, the platform advises that the best model is the XGBoost, with an accuracy of 80% and a processing time of 13 seconds.

 

We can also evaluate the confusion matrices that explain the model's ability to predict each option, which is important depending on the use case we are analyzing.

 

MeganDibble_3-1660673832945.png

 

The importance of variables is another characteristic that is presented.

 

According to each model, this tells us which variables are most important for predicting the target variable, focusing on the most relevant variables, and generating focused actions on those that may have the greatest impact.

 

MeganDibble_4-1660673832951.png

 

Prediction Explanations

 

If you are looking to get more in-depth explanations out of the model’s results, you could take a look at this article by Ira Watt, where he explains how using a bit of custom python code, you can get the prediction explanations for your Assisted Modeling created models. 

 

Python Code? No Problem

 

Are you a developer and prefer to write your code by hand because it allows you to have more control? No problem, Assisted Modeling is here to help you; you can create prototypes or drafts of the models you require and export them to Python to effortlessly create the base of your model with just a few clicks.

 

Select Export Model to Python.

 

MeganDibble_5-1660673832961.png

 

And now you can see the model in Python code within Alteryx Designer to start using it immediately.

 

To finish the process, select the winning model by clicking on the check and then clicking on Add models and continue to the workflow.

 

MeganDibble_6-1660673832978.png

 

Analytics Pipeline

 

Now you have a complete workflow with the pipeline that you can use to score your data, either batch with Designer, Alteryx Server, integrated within another system using the Rest API of the Altyeryx Server, or even implement it to score in real time using Alteryx Promote.

 

MeganDibble_7-1660673832998.png

 

This shows the entire process of the model in Python code on the Jupyter Notebook included in the Python tool in Alteryx Designer.

 

MeganDibble_8-1660673833001.png

MeganDibble_0-1660675707243.png

 

Scoring

 

To score more data after model training, we can connect the new dataset and use the Predict Values tool to assign a dropout probability to each record.

 

MeganDibble_10-1660673833031.png

 

Hyperparameter Tuning

 

Even after the model is finished, we can modify the hyperparameters of each model to refine it further, giving great flexibility to the process.

 

MeganDibble_11-1660673833050.png

 

And it continues to explain each parameter you select.

 

MeganDibble_12-1660673833061.png

 

Justify Decisions Through Self-documentation

 

You have already created your first analytical model—you are not an expert in this, so how can you justify the results or explain them to the data science experts?

 

Source: GIPHY

 

Do not worry—Assisted Modeling is here to help you.

 

At the same time that the assistant was showing us what it was going to do at each stage, at the end of the process, it created the analytics pipeline with all the steps and decisions we made to be able to show it and justify the work with the experts as well as potential quality assurance, auditors and reviewers who need to verify how decisions are being made.

 

MeganDibble_14-1660673833568.png

 

The flow includes all the steps, and we can review and even modify them if necessary.

 

MeganDibble_0-1660681670014.png

 

Additionally, if you want to discuss the results with more people or in another context, you can export the results reports in HTML and take them with you to that important meeting.

 

MeganDibble_1-1660681720819.png

 

Putting Models in Production

 

After the validation process and ensuring everything is correct, you could easily put your model into production by uploading it to Alteryx Server.

 

This way, users could consume the model in a self-service fashion, schedule it for automated insights or expose it to be consumed by other apps or services using its Rest API.

 

For that, you just need to export the selected trained model as a .YXDB object.

 

MeganDibble_2-1660681759204.png

 

Once you have the model, add it as an input, and add the new data to score to the blue Predict tool.

 

MeganDibble_3-1660681788829.png

 

You can then upload it to Alteryx Server for batch scoring or execution via the Rest API.

 

MeganDibble_4-1660681813423.png

 

Final Thoughts

 

This is true augmented intelligence, the ability to harness one's experience and the potential of machine learning.

 

And this materializes the democratization of analytics; we don’t need to be experts to create a predictive model from scratch and get good, sounding business results.

 

What really gives you power ...

 

Source: GIPHY

 

And the thrill of solving with Alteryx.