We’ve heard your feedback, and you’ve made one thing clear: a citizen data scientist is not a single role with a defined skillset, but a spectrum of roles with varying analytics needs and levels of expertise. To help meet the needs of whatever role you’re in, we are thrilled to expand Intelligence Suite’s Machine Learning tool group with four new tools: Data Health, AutoML, Features Types, and Build Features. We developed three—AutoML, Feature Types, and Build Features—using our Innovation Labs’ data science libraries. You can now unleash the power of these open-source Python packages in an Alteryx workflow. The fourth is a new tool that helps you quickly identify how healthy your data is before training your model.
Data Health
Whether you’re just starting your journey to citizen data science or you want to streamline your machine learning pipeline, the Data Health tool is for you. The Data Health tool allows you to check on the state of your data for predictive modeling.
This tool gives you insight into your dataset by focusing on six metrics: missing values, uniqueness, sparsity, unary fields (like IDs), and outliers. That way you can fix your data’s problems and build your best model on the first try. Configuration is simple; all you have do is choose whether you want your scores to be normalized (0-1) or in percentages (0 – 100%), then the tool provides scores based on the six metrics. Want to learn more? Check out the help docs here.
Feature Types and Build Features
If you’re closer to the advanced end of the spectrum (already comfortable using Assisted Modeling) and want to enrich your datatypes or generate new features, use the Feature Types and Build Features tools together to take your models to the next level.
The Feature Types tool detects enhanced datatypes. Enhanced datatypes are just fancy versions of regular datatypes. For example, a five-digit number would be characterized as numeric in the basic datatype. But in the enhanced datatype, it could be detected as a US Zip Code. By default, the Feature Types tool automatically detects enhanced types of the data field when you run the workflow. You can override the automatically detected type by specifying the datatype in the Change Type column.
The Feature Types tool is a required input to the Build Features tool for effective feature engineering and an optional input for the Data Health to enhance the report output.
The Build Features tool automatically creates new features from your existing data. This helps format your data in a way that the machine learning model can analyze, which increases the chance the machine learning model will find meaningful patterns. It helps by uncovering variables you might not have considered (or prioritized). For example, you can transform a column for “date of birth” into new features like “age” or “birthday month.”
AutoML
After you’ve made sure your data is healthy and you move on to modeling, you may want to schedule your model-training workflow in Alteryx Analytics Hub or Server. All of Intelligence Suite’s Machine Learning tools work great in Hub- and Server-hosted workflows, and scheduling your model-training workflow to retrain your model on a periodic basis is a great way to keep your model performing at its best.
But what happens if you expect your training data to change a lot over time? In that case you may want the flexibility to automatically update your model’s algorithm to the best fit for your latest training data (e.g., going from a Random Forest model to an XGBoost model) vs. committing to the algorithm you selected when you first built your model-training workflow (e.g., always fitting a Random Forest model).
If so, we’ve made you a tool that’s as easy to use as Assisted Modeling and perfect for that situation. The AutoML tool allows you to train a model without the pop-up interface of Assisted Modeling, and it intelligently selects the best algorithm for you. Although the AutoML tool doesn’t offer the guided experience of Assisted Modeling, it does provide the same power of the EvalML auto-modeling library in a single, self-contained Alteryx tool. To use the AutoML tool, simply select your target variable. From there, the machine learning method is automatically configured based on the model estimate. That said, you can always override it by manually selecting the machine learning method.
This tool can also be useful if you are a very advanced user who wants to quickly create effective machine learning models directly in a workflow or create analytic apps that select a new trained model on every run.
Try It Out Yourself
You can download the Intelligence Suite starter kit today to explore pre-built templates with sample data, workflows, and use cases. Available in Designer in 2021.1 with an Alteryx Intelligence Suite license (Contact your account representative!) these new tools have sample workflows available (Help > Sample Workflows > Learn one model at a time) to help you get started. You can also check out this post for a detailed example of how to use the new tools.