As the data landscape evolves, organizations’ analytical needs grow increasingly complex. Historically, modern data-science techniques have been isolated to the select few who not only were experts at programming but were also deeply entrenched in statistics. Now that is no longer true! The Alteryx Intelligence Suite democratizes advanced analytical capabilities for any interested Alteryx user. It brings the power of predictive machine learning and natural language processing to all organizations looking to unlock the power of their data with Alteryx.
In our initial release of the Alteryx Intelligence Suite, we chose to focus on two of the most common data science challenges facing organizations today:
The Alteryx Intelligence Suite is designed for organizations in all stages of their analytic journey. For a beginner analyst or an organization that is just starting to adopt advanced analytics, the Intelligence Suite gives you all that’s needed to begin your analytic journey knowing that the choices you make via our drag-and-drop building blocks and on-screen guides are backed by best-in-class data science via established open source libraries like scikit-learn and XGBoost. For advanced users, the building blocks provide in-depth configuration and customization of these libraries integrated into the Alteryx environment.
Users can begin to explore their predictive problems such as ranking customers who are most likely to churn or predicting the probability of an event of interest using Assisted Modeling. As the organization matures, models can be deployed via Alteryx Promote or Alteryx Server for production. If desired, models can always be translated to raw Python code to share with other data scientists or deployed in a cloud ecosystem.
Whether being used for prototyping or production, the process is transparent, letting business analysts and citizen data scientists work together. The same best-in-class data science capabilities are true for our text mining building blocks. They are built on libraries like Tesseract, VADER and scikit-learn, ensuring that users are getting the best-in-class capabilities available in the market, all with the ease of use of Alteryx.
The Intelligence Suite’s Text Mining Capabilities
I’m excited to highlight some of the amazing capabilities of our Text Mining tool group. At its core, the Text Mining Tool group makes it simple to get text into Alteryx from any format, including via PDFs and images by optical character recognition. This capability alone enables a whole new way for users to bring data into Alteryx. However, once data is there, the Text Mining tool group also provides building blocks for manipulating and processing that data even further.
Preparing Text for Analysis
The tool group includes a building block specifically to prep text data for analysis, performing “lemmatization.” Put simply, this approach helps take different forms of words into their base grammatical component. For example, “am” / ”are” / ”is” all become “be,” and “cat” / “cats” / “cat’s” / “cats’” would become just “cat.” When performing advanced learning on text, this step is crucial to generalize large bodies of complex text into simple underlying structure.
With the Intelligence Suite, the task is as easy as dragging a building block onto the Designer canvas and clicking your way to a custom configuration.
Social Listening
Mining the social web has become a disruptive new way for organizations to understand their product impact in near real time. Tweets can be collected and defined as being positive, neutral, or negative, and businesses can keep a daily metric of “positive to negative ratio” commentary to see how the web is reacting. That said, defining a tweet’s sentiment, and doing it at a large scale, used to require someone to get deep in code. With our code-free sentiment analysis building block, this becomes an easy task.
With a very simple workflow leveraging the Intelligence Suite, you can create an effective way to bulk analyze tweets!
Topic Modeling
Michael Jordan, along with David Blei and Andrew Ng, is one of the main authors on the journal article introducing Latent Dirichlet Allocation, the research that underlies the field of topic modeling. To no one’s surprise, this isn’t the same Jordan as the 14-time NBA all-star and brief minor-league baseball player for my favorite team, the Chicago White Sox. Imagine though, you had two giant blocks of text about both the sports star, and the University of California Berkley machine-learning star. How could you tell them apart?
Well, the distribution of words in those documents would likely be very different. Topic Modeling looks at those distributions, realizing that some words might be common to both, but likely co-occur in other unique patterns. Applying Topic Modeling to these texts could help you annotate all your documents with topics like “Basketball” or “Machine Learning,” but you also might discover other themes like “Sneakers,” or “Space Jam,” that could help you further organize, search or summarize your texts. One can imagine how organizations armed with lots and lots of text documents could begin to utilize this technology.
I had the luxury of learning topic modeling from John Lafferty, a coauthor of David Blei during my PhD. Bringing this technology to all users of all types of academic and work backgrounds is a personally close and exciting venture into democratizing data science for me! Now instead of struggling to get code to work based on complex underlying mathematical models, I can drag and drop tools in Alteryx and quickly start to explore the topics in any set of documents.
Visualizing Your Output
The Text Mining tool group lets you build word clouds from your output, giving you a graphical representation of your analysis, complete with filters and options to make your graphics shine! For example, below is our data science word cloud, in the shape of a cloud.
Machine Learning with Alteryx Intelligence Suite
Walking through all our new machine learning capabilities is too much content for this post, so instead I’d like to focus on some of my favorite features in the new Machine Learning tool group.
Full Transparency and Control
The Assisted Modeling building block keeps humans in the loop with machine learning. While it profiles data to make the best suggestions possible given multiple heuristics and best practices, no one knows your data better than you! As opposed to other black-box solutions, Assisted Modeling shows why it’s making the recommendations it is and at what certainty, and always allows you to override its choices.
Feature Importance
Picking the right data for a model is hard. If you’re not careful, data that wouldn’t be available for the model in the future could accidentally be included in your training set. This phenomenon is often referred to as “data leakage” and can cause models in production to fail entirely or produce subpar results. On the other end of the spectrum, often we don’t know what data is important to a task at hand, so we throw in everything we have. This is often the best agnostic approach; however, it can slow down the modeling process, and has the potential of complicating algorithms—causing them to perform worse than they would otherwise.
Assisted Modeling uses two techniques (Gini Impurity and Goodman-Kruskal Tau), to identify the best set of features to use to efficiently generate an unbiased, high-quality model.
Leaderboard Explorations
Perhaps my favorite theorem in all of machine learning is the “No Free Lunch Theorem”. Roughly paraphrased, it implies that there’s no way to know which modeling algorithm is going to be right for any particular dataset. While XGBoost may be best for one set of data, a simple Linear Model could work well for another. Our only solution to this problem is to run multiple models on training data, and empirically see which one works best.
Assisted Modeling’s leaderboard page allows us to do just that, with multiple models optimized to run in parallel given the constraints of your computer.
Upskilling
The most valuable part of Assisted Modeling for many analysts will be that it can help you get better at Machine Learning and gives you the option to see your work graphically or as bare code. It carefully guides you through the modeling process, explaining what it is doing and why, while providing a detailed glossary that explains terms and methodology in plain English. You can simply click through default options, or, as you gain experience, begin to experiment on your own, putting the “science” in data science! As you practice, you can skip the “assisted mode” altogether and focus on creating models right on the canvas. Ultimately, you can turn your model into raw Python code, letting you use the graphical interface to model and then see and edit in code what your guided modeling experience created.
Whether you’re a newcomer or experienced, Assisted Modeling helps you build or prototype, and ultimately share or explore models in their native Python representation, completing the journey from building blocks to executable code.
Parting words
I’m excited to see what solutions you come up with using the building blocks in the Alteryx Intelligence Suite!
Contact your account representative to purchase a license of the Alteryx Intelligence Suite, available in Designer in 2020.2, so you can start creating and democratizing data science across your organization. If you don't know your rep, or have questions about how to get started, head over to our Support Contact Portal and select Request Access to Intelligence Suite so we can get you set up and ready to go.
Alex is focused on guiding product teams charged with defining and executing the strategy for Alteryx’s advanced analytics and data science products, along with creating SDKs for the Alteryx end to end platform.
Alex is focused on guiding product teams charged with defining and executing the strategy for Alteryx’s advanced analytics and data science products, along with creating SDKs for the Alteryx end to end platform.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.