Data Science

Machine learning & data science for beginners and experts alike.
mstarks
Alteryx Alumni (Retired)

As a fairly new developer on the Core Engines team, I am thrilled to have the opportunity of adding functionality to the R Tool. While working on new features, I have taken a bit of time to understand more about R and how R can be used to solve real-world problems.

 

According to their home page, "www.r-project.org", R is a software environment for statistical computing. It allows users to define statistical models, perform analysis tasks, and plot results. Because R has been embraced by a large community of developers, its capabilities are continually expanding. In fact, over 3700 packages have been developed! These can be downloaded from the Comprehensive R Archive Network at "cran.r-project.org". Fortunately, only a small subset of the packages are required for most business applications.

 

What statistical analysis approaches are important in the context of business intelligence? Predictive modeling can be used for prospecting, qualifying prospects, cross-selling, up-selling, analyzing attrition and churn, and detecting fraud. Grouping can be used for market basket analysis, recommendation systems, fraud detection, and customer segmentation. Data mining allows large amounts of data to be summarized in a way that supports decision-making.

 

Specific predictive modeling techniques include linear regression, logistic regression, decision trees, and random forests. Predictive models allow you to estimate the probability of a given behavior based on previously acquired data. Grouping methods include K-Centroids clustering and hierarchical cluster analysis, along with association rules. Interesting patterns can emerge when you find useful ways to group data.

 

The initial work of bringing the capabilities of R into Alteryx was completed prior to the 7.0 Release. Here is a quick look at what can be done at this point.

 

The R Tool can be included directly in any module. It can accept multiple optional inputs. The in-coming connections can be read within R. Users can write their own R scripts to perform statistical analysis. (The "R Tool Predictive Analytics" sample module provides an example of how this works.) The R Tool has two optional output connections. The left output is for writing data values, and the right output is for generating graphs. Most problems that can be solved using R can now be addressed within Alteryx.

 

What can you expect to see in the future? Dr. Dan Putler has created several macros that can be used to accomplish the most common predictive tasks. These are going to be made available under a new "Predictive Tools" category within Alteryx. If you want to invest in the power of R, these macros are going to provide useful analysis techniques and examples to help you get started. We are beginning to look at how the data artisan can easily incorporate the results of predictive analytics activities into the Alteryx workflow. Predictive Model Markup Language (PMML) is the industry-standard approach for defining and sharing data mining models. You can expect to see some PMML support within Alteryx this year. Also, the new Alteryx R Data Exchange package is going to allow R scripts to read and write data in the Alteryx YXDB format. Additional features are in the planning stages.

 

I am fascinated by the strong interest in R expressed by the Alteryx community. We are committed to making predictive analytics a seamless part of the Alteryx user experience. Please contact me or Dr. Dan Putler if you have specific questions or recommendations for improving the R Tool. Thanks!

Comments