Co-authored with @clinton_regan.
We’ve always wanted building workflows with Alteryx Designer to be an intuitive, easy process. But now we’ve built Designer to have some intuition of its own.
In Alteryx Designer 2021.3, we introduced a new tool group you may have noticed: the Recommended tools category, which suggests the tools you may need next as you build your workflow. This feature will guide you toward tools that best fit your workflow’s direction and likely goals. We hope newer users will appreciate this tool group’s guidance, while experienced users may discover new tool options to help them achieve their analytic goals efficiently.
But, of course, those recommendations are built on some data science! We thought the data science-inclined among us would enjoy some details about the construction of this feature, including how the model was trained and how the recommendation engine was integrated into Designer.
Image via GIPHY
Similar to text autocomplete on your mobile device, the predictive model behind the Recommended category is specifically trained to provide users with the set of tools most likely to occur next in their workflow, in addition to increasing users’ exposure to rare tools at the right opportunity in the workflow building process.
Borrowing a technique from natural language processing (NLP), we first used Alteryx Designer to extract sequences of tools used from deidentified, aggregate workflow data, similar to sequences of words in a sentence. We then trained a Long Short Term Memory (LSTM) classifier to identify common patterns in the tool sequences and predict the tools that most likely would be added to the canvas next — again, much like text autocomplete on your smartphone.
We could have modeled the workflow data using its native graphical structure and a graph-based neural net such as a CNN, but this would not lend itself to easy interpretation of workflow patterns. On the other hand, enumerating tool sequences from millions of workflow graphs would be a slow and memory-intensive process, if not for the efficient parallelization provided by Alteryx Engine. Using a cloud-based Alteryx Server, we were able to quickly extract millions of tool sequences by recursively joining each tool in the workflow data to its parent (upstream) tools using an iterative macro.
Image via GIPHY
With tool sequences (sentences) in hand, we trained the LSTM model to predict what the last tool in each sequence would be, using Designer’s Python Tool and Keras. In general, LSTMs are powerful model architectures for discovering patterns in sequence data, but they can require a bit of optimization. In particular, we found that characterizing each tool type according to its most common surrounding tools using the popular word2vec algorithm, rather than a one-hot encoded representation of each tool (which captures only the tool type) led to large improvements in the classification accuracy.
Using our trained model, we wanted to ask what types of workflow patterns are most strongly predictive of each type of tool. Using a holdout set of workflow data that the model had never seen, we identified sequences that were highly predictive for each tool type, and then used the Word Cloud Tool to qualitatively examine patterns that were enriched in those sequences.
When it came time to integrate these recommendations into Designer, our first challenge was identifying the right place to put this experience, since there are a few places to start: the Tool Palette, the Canvas and Global Search. Based on early prototypes, design concepts and initial user testing, we settled on creating a brand-new category in the Tool Palette. We wanted to ensure feature discovery was high for new and experienced users alike. Experienced users already know the Tool Palette and new users must learn it, so it was the natural place to start. (This doesn’t mean the Canvas and Global Search will never get this kind of enhancement; we want your feedback on this!)
Creating a dynamic tool category is a new task for us. We weren’t sure how it would go until we got deeper into development. Our tool palette is static, meaning it does not update on its own. Fortunately we referenced some experiences that also update the palette automatically (such as adding new categories to show). Then we modified it so the category updates based on quick user interactions.
Image via GIPHY
With the front-end experience decided, the Recommended model engine needed to be integrated. The model accepts two primary parameters: the focused tool on the canvas and up to 30 upstream tools. Every time a user changes focus between tools, Designer sends the smaller version of the workflow and the new focused tool to the model engine, which then analyzes and sends back up to six tool recommendations. The general engineering design itself is straightforward.
The end result is a smooth, dynamic tool category that updates as our users build their workflows. Our goal was to help new and experienced users discover new tools and categories. Alteryx Designer has over 270 tools available for usage, making us a very powerful toolkit for solving data science and analytics problems.
We hope the Recommended tool category experience is helping you find new tools and solving analytic problems. Our plan is to continue collecting feedback and explore new enhancements to Canvas and Global Search that makes learning our products easier for everyone. Please reach out to us with your feedback or if you have interest in being part of early user testing.
Do you have questions? What are your thoughts on the Recommended category so far? We’d love your feedback. Are there other aspects of Designer you’d like to see personalized for you? Let us know with a comment below, and subscribe to the blog to get future articles.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.