Analytics

SydneyF · ‎06-12-2019

Data science and analytics are important and rapidly growing fields. There is an active demand for people to continue upskilling and developing their knowledge of data-science applications like predictive modeling. With that in mind, we are happy to announce the beta release of Assisted Modeling, an application that will be included in Alteryx Designer to help aspiring citizen data scientists learn and use the predictive modeling process on their own data and real-life use cases.

You might be wondering: “is Assisted Modeling right for me?”

Let’s find out together.

What is Assisted Modeling?

Assisted Modeling is an application that will be included in Alteryx Designer to help you get started with the predictive modeling process. Think of Assisted Modeling as your own personal data-science advisor sitting on your shoulder and giving you suggestions on where to go next. You’re still very much in control, but now you have a helpful guide leading you along, and teaching you concepts and ideas while you’re at it.

Who is Assisted Modeling for?

Assisted Modeling was designed for analysts who want to learn more about statistics, programming, and machine learning, but perhaps lack a significant background in these topics. That means if you’re a PhD-holding data scientist who has spent the last three years researching deep-learning frameworks, you won’t need Assisted Modeling’s help. This application is specifically for the aspiring citizen data scientist, who has an analytics and data background and is excited to learn more about predictive modeling and machine learning.

However, if you are experienced with statistics and analytics and would like to be able to apply the processes and models that are used in Assisted Modeling, the good news is that all the cool features and models that are available in Alteryx as a part of Assisted Modeling are also available in the 2019.3 beta release as code-free Machine Learning tools. The only additional context you’ll need for using these tools on top of your data-science expertise is an understanding of the pipeline framework they’re based around.

Why Assisted Modeling?

Assisted Modeling is different from other types of automated modeling in that you have a hand on the steering wheel the entire time. With Assisted Modeling, you can leverage your business and data expertise as a part of the process. You can ask if each step in your process makes sense in relation to what you know about the data, and what it means in the context of the real world. Assisted Modeling is also an opportunity to learn, step-by-step, what the process feels like. Instead of depending on a black box, you will have the opportunity to keep transparency in your model, while also learning something new.

How does Assisted Modeling Work?

Assisted Modeling looks like a standard Alteryx tool. To use it, you’ll connect to your dataset, and run your workflow to load your data into Assisted Modeling (Like the Insight tool, Interactive Chart tool, or Python tool).

Once the data is loaded in, click on the “Start Assisted Modeling” button in the Configuration window.

This is where things get interesting! Once you click the “Start Assisted Modeling” button, a modal window pops open. In large, friendly letters, you’ll see the heading “Getting started with assisted modeling.” On the right side of the screen, there are drop-down tabs for each step in the predictive modeling process that Assisted Modeling addresses, with a description of what the step entails (with pictures!)

Once you’re oriented, you can choose to skip this screen whenever you use the Assisted Modeling application in the future by checking the box in the bottom-left corner.

When you’re ready to start the process, click “Start Building.”

Step One: Select target variable and model type

The first step in the Assisted Modeling process is to select your target variable from your dataset. The target variable is the variable (column) in your dataset that you are interested in creating a model for.

The data type of the variable you’re interested in modeling will determine the type of modeling you are going to perform. Eventually, Assisted Modeling will include the three major types of predictive models: regression, classification, and clustering. For now, the beta just includes classification, which can make picking your target variable pretty easy. 😊 In the beta, you will only be able to select categorical variables (variables where each value in the column represents a group or category) in your data as a target variable.

When you’ve selected your target variable, you can click the blue Next button in the bottom left screen to move on to the next step. Because the options in Assisted Modeling will be slightly different depending on the type of modeling you are doing when you select a target variable, you are effectively locking yourself into a model type (classification, regression, or clustering). Assisted Modeling will check in with you to make sure that you’re sure you’ve selected the target variable you want to work with.

If you feel good about your target variable selection, you can click Continue.

With your target variable selected, you can move on to the next screen, which deals with data typing for the rest of your variables.

Step two: Data typing

In this step, Assisted Modeling will make recommendations on a data type for each of your variables. Selecting the correct data types for the variables in your data set is important because predictive models handle each data type differently. There are four possible data types in Assisted Modeling; numeric, categorical, boolean, and ID. The ID data type is for unique identifier columns, which help keep track of your data but don’t have any predictive value – any data with an ID type will be dropped.

The variables in this screen are sorted, pushing the variables that most need to be reviewed by you to the top. You will be able to override the data type recommendations given by Assisted Modeling. It is important to check these proposed changes over because you understand what your data is and how it should be handled. Assisted Modeling is here to give you suggestions and help guide you, but at the end of the day, you need to make sure the choices and results feel logical to you.

With your data types sorted, you can move to the next step, which involves imputing values to replace nulls in your dataset.

Step three: Clean up dataset

By default, any numeric variables will be set to impute null values with the median of the variable, and categorical variables will be set to impute null values with the mode of the variable. You can override these options and can impute the null values with the mean of the variable.

This dashboard will only show columns that have missing values, so if nothing comes up here, you have nothing to worry about!

This brings you to the next step, which is variable selection.

Step four: Select columns

Here, you can remove any variables (columns) that should not be included in your model, including variables with low predictive power.

With your data prepared, you can now use it to train a selection of models.

Step five: Select models

You will be given multiple algorithms (the recipes for different flavors of predictive models) to choose from. You can choose to run any combination of the available models on your data set. Click Run selected models once you’ve made your choices and are ready to see how they perform!

At this point, your data will be run through each of the models to determine which one performs the best for your given dataset. When it is done running, you’ll see the option to View in leaderboard for each model.

A technical aside …

Peeking under the hood of Assisted Modeling, what actually happens here is your dataset is divided into three groups used to run three-fold cross-validation, which is a way to estimate the performance of a model on an “unseen” data set. Effectively, one-third of the data set is held out while the other two-thirds are used to train a model, and then the held-out set of data is run through the model and used to calculate metrics like accuracy (how many records the model correctly predicted). This process is repeated three times, so each “split” of the data is used as a holdout, and then averaged across the three iterations. This process is the same for each of the models you elected to run and allows you to compare each of the models in a meaningful way.

If that “technical aside” doesn’t make a ton of sense to you yet, don’t worry! All you need to know is that Assisted Modeling is checking how your dataset performs using each of the models.

The resulting metrics are populated in the next screen – the leaderboard. Each model you ran will have its own dashboard of metrics. In the beta release, there are three tabs: “Comparison”, “Overview”, and “Configuration”.

The Comparison tab reports a variety of metrics for each model and displays them on a single screen. There is also a super-handy glossary on the right-hand side that provides a definition for any term or metric you might not be familiar with.

"Overview" gives you the breakdown of a couple of important metrics calculated to determine how well the model performed; accuracy and balanced accuracy. Accuracy is a simple percentage of how many rows the model predicted correctly for the whole dataset. Balanced accuracy is the percentage of correctly predicted records, with each possible target value accounted for equally. This is helpful for datasets where your target variable is imbalanced (e.g., “yes” occurs twice as frequently as “no”) because it allows you to check that each of your possible target values is being predicted with similar accuracy.

“Configuration” shows the details of everything you’ve done with your data in Assisted Modeling. There are nice drop-down tabs for each step in the process.

With all this information and helpful definitions embedded into the application, you can select the best-performing model for your use case. When you’ve made your decision, you can select “Use this model” on the left-hand side of the screen for the model of your choice.

As a final step, Assisted Modeling outputs a workflow composed of Machine Learning tools that execute the process you developed with Assisted Modeling.

That’s Assisted Modeling in Alteryx, from start to finish!

What about the R-Based predictive tools?

If you know and love the R-based predictive tools, not to worry! They aren’t going anywhere; they will remain a separate installation from Alteryx Designer. Assisted Modeling is included in a beta release of Alteryx Designer – no separate installation required.

Just go for it!

As an Alteryx Designer user, you can access Assisted Modeling at no additional cost, so if you think it might be right for you, why not give it a try? If you sign up for the beta, you can even help refine and improve Assisted Modeling by providing feedback.

Analytics

Is Assisted Modeling for Me?

Assisted Modelling Error on 'start assisted model...

Predictive Modeling

Assisted Modeling stuck

Saving Models developed using Assisted Modelling

Multifield Formula assistance