Data Science

Machine learning & data science for beginners and experts alike.
DrDan
Alteryx Alumni (Retired)

Recently Boris Evelson of Forrester Research raised some interesting question around the use and integration of R within the context of an overall BI/Analytic workflow and platform. For my short reply, visit his blogpost (and see my reply) here. In the meantime I wanted to take a moment to walk through how we at Alteryx are in the process of seamlessly integrating R into the Alteryx platform. For those of you who my not know, basic integration occurred with the introduction of the R-tool in the 7.0 release of Alteryx in February, which allowed R scripts to be created and run in an Alteryx workflow. Closer integration, along the lines consistent with your specific points, was one of the main objectives with our recent Alteryx 7.1 release where we added 18 new R tools to the platform.

 

Going forward, the level of integration between Alteryx and R will continue to increase. Some key points to note on the use and integration of R within the broader context of Alteryx:

 

    • Alteryx 7.1 has point-and-click GUIs for a number of the most commonly used predictive analytics and business data mining methods. Specifically, these include linear regression, logistic regression, decision trees, random forest models, stepwise variable selection, K-centroids cluster analysis, and principal components analysis. In addition there are tools to help a data artisan to explore and understand the data she or he is working with, and a set of visual diagnostic tools (such as lift charts) to help a data artisan quickly compare and assess the performance of different models. The number of R-based methods that will have a GUI front-end will continue to expand. In the near future we’ll be introducing additional tools for conducting time series forecasting and related times series methods.  
    • Alteryx, via an open source R package, allows for the seamless movement of the Alteryx data stream into and out of R, maintaining all of the critical metadata about the data fields. This includes the ability to move spatial objects (e.g., trade area polygons) and time-date fields into R intact. In addition, object names that are acceptable in Alteryx but not in R are renamed to conform to R’s requirements, with the user being informed of the field name changes.
    • The R model summary report that is produced using one of our GUI tools is readily accessible to Alteryx’s reporting tools for easy creation of reports and dashboards. In addition, an R model object can be placed into an Alteryx data stream which allows that object to be used further downstream in the current workflow for scoring or model assessment, or saved to disk and then easily imported into another workflow. For organizations who want to implement their own R-based functionality (via the creation of R scripts they write and bring into Alteryx using an R-tool node), we provide a number of tools (both Alteryx tools and R functions) that allow a developer to easily prepare R output for reporting using Alteryx’s reporting tools.
    • Currently model creation and scoring are done in the Alteryx server or the Alteryx designer desktop products. Longer term, we plan on providing the means to move model scoring into the backend database server for faster execution with even larger datasets.
    • Many of the features of the Alteryx and R integration mentioned above are directly related to model design, management, and execution capabilities. In addition, we are currently highly focused on creating tools that help to automate the model design and development phase, particularly as it relates to common predictive analytics tasks in the verticals we serve.

Predictive and strategic analytics remain a major focus in our product roadmap with more tools, complete solution packages, and apps planned in our upcoming releases. If there is a feature you'd like to see in Alteryx, please send us an email at "Products at Alteryx.com."

Dan Putler
Chief Scientist

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.