community
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Science Blog

Machine learning & data science for beginners and experts alike.
Alteryx
Alteryx

Last Friday was a very busy day for several of us at Alteryx in the wake of the announcement that Microsoft and Revolution Analytics had agreed to have Microsoft acquire Revolution Analytics. In this post I won't go into the Alteryx angle of this story, other than to say we think this is a net positive. Instead, I wanted to provide a few words of appreciation for what Revolution Analytics has done for both R based technology and for their non-technology contributions to the R community since its creation (as REvolution Computing) in 2007.

 

Contributions to R Based Technology

Revolution Analytics has long been at the forefront of efforts to scale R for applications involving large amounts of data. They have approached this problem using both coarse grained parallel and streaming computing approaches. As of now, considerably more effort is going into coarse grained parallel computing approaches (with Hadoop being the most well publicized of these efforts), but streaming computing approaches can be very effective in scaling predictive analytics with more limited hardware resources. The most impressive methods for doing this that I have seen are the streaming linear model and generalized linear model methods contained in Revolution Analytics (proprietary) Revo ScaleR package (which also makes use of Intel's multi-threaded linear algebra libraries), that comes with their Revolution R Enterprise product. We have found that with moderate data volumes they are faster than the comparable open source R functions, and they can easily scale to millions of records on a common business laptop configuration (e.g., 8 GB of memory and a modern multicore CPU), while that same configuration is capable of estimating the same type of model with at most between 100,000 to 200,000 records with fewer than 10 predictors using open source R's lm or glm functions. What Lee Edlefsen and the engineering team at Revolution Analytics has done in this area represents the state of the art (they clearly outshine comparable methods from SAS and IBM/SPSS), and will likely represent an important point of comparison for others developing streaming algorithms for a long time to come.

 

While they have kept their streaming methods proprietary, they have given back to the R community much of the technology they have developed in the area of coarse grained parallel computing methods in R. Chief among these are the foreach and the iterators packages. In academia, one thing professors are judged on in tenure and promotion decisions is how many other published articles cite their articles (there are a number of different broad discipline oriented citation indexes, such as the Social Science Citation Index, that provide this information, and a lot of attention is now being paid to Google Scholar citations, which are often more interdisciplinary in nature). The R Project originated in academia, and still has a very academic feel to it. As a result, the package archive for the project (CRAN, or the Comprehensive R Archive Network) provides something very similar to a citation index. Specifically, for every CRAN package there is an indication of how other CRAN packages make use of it. There are three levels of this: a "reverse dependency" (the package is absolutely necessity to install another package); a "reverse imports" (the package is a critical component of another package, but the package can be installed without it); and a "reverse suggests" (a package provides additional, less central, functionality to another package). The Revolution Analytics foreach package has (as I write this) a reverse status on the part of 111 other R packages, while the iterators package has a reverse status on the part of 34 other CRAN packages. Only three of these represent "vanity reverses" (i.e., a package that makes use of a package written by the same author), and the vast majority are either of the more important "depends" or "imports" variety. In both cases, this represents an extraordinarily high number of reverse status packages (the reverse status for the foreach package is extremely high). Put another way, if there was a University of R, Revolution Analytics would have the rank of Full Professor.

More recently, Revolution Analytics has made an effort to address issues that come up with R packages. The first of these efforts is incorporated in the miniCRAN package that allows an organization with strict firewall rules to create an internal, selective archive of R packages that members of that organization can access. The second package that addresses issues surrounding R packages is the checkpoint package which is closely linked with Revolution Analytics "Managed R Archive Network", or MRAN. The purpose of the combination of the checkpoint package and MRAN is to address a common problem in reproducing R based research results, addressing changes in contributed R packages. R consists of three components, a small set of "base" packages that provide basic functionality; a still very small, but somewhat larger, set of "recommended" packages that provide additional core R functionality, and then a huge set (nearly 5000 as of this writing) of contributed packages. R's base and recommended packages are shipped with R's installer package from CRAN, and are very stable. The same cannot be said of all of R's contributed packages. We at Alteryx have never had issues migrating to the base or recommended packages of a new version of R, but we have experienced a few hiccups in migrating to some new versions of contributed packages that we use and bundle with our Predictive Plug-in (yes, regression testing is a useful thing). It turns out we are not alone, and in some cases (particularly in clinical trial settings for new drugs or medical devices) can make research results difficult to reproduce. The problems can be due to changes in the API of a package (that can cause R analysis scripts to break) or changes in the underlying methods used by a package (which can change the nature of the results in marginally significant cases). The goal of the checkpoint package / MRAN combination is to allow researchers to "freeze" on a particular vintage of R packages in order to make sure past research results can be replicated in a setting that takes changes in underlying R packages out of the picture. I view this as a very selfless move on Revolution Analytics part since it is a technology that is likely to be extremely useful to portions of the R community, takes real resources on the part of Revolution Analytics to implement, but is one that seems difficult for them to monetize.

 

Non-Technology Contributions to the R Community

Revolution Analytics has consistently given back to the R Community on a non-technological basis in three ways. First, it has been a primary sponsor of the annual international R user group conference (UseR!) since 2008 (longer than any other software vendor, only the book publishers CRC Press and Springer having been sponsors of the conference more years than Revolution Analytics). Since Revolution Analytics was only founded in 2007, the length of time they have been a primary sponsor of UseR! is remarkable.

 

The second way Revolution Analytics has given back to the R community in a non-technical way is in help sponsor local R user groups, through there R User Group Sponsorship Program. I am a member of the Bay Area R Users Group which Revolution Analytics sponsors, and Joe Rickert of Revolution Analytics acts as the primary organizer. Revolution Analytics provided financial support to 51 local R user groups in 2014, all local R user groups are eligible for sponsorship, but not all apply). In addition, it supports all 150 local R user groups via the Local R User Group Directory, the R Community Calendar, and the @inside_r Twitter channel.

 

The third way they contribute back to the community is the Revolutions blog which is one of the longest on-going blogs that covers topics relevant to the R community. Most company blogs are done for specific, very narrow marketing or product education purposes. However, this is not the case with the Revolutions blog, which strives to cover all topics relevant to the R community, even new R based technologies that represent, at least to my mind, a potential competitive threat to them.

 

Going Forward

 

What exactly the longer-term future holds for Revolution Analytics as they become part of the Microsoft family is unknown at this point. However, my belief is the assessment of David Smith (Revolution Analytics Chief Community Officer) that

For our users and customers, nothing much will change with the acquisition. We’ll continue to support and develop the Revolution R family of products — including non-Windows platforms like Mac and Linux. The free Revolution R Open project will continue to enhance open source R. We’ll continue to offer expert technical support for R with Revolution R Plus subscriptions from the same team of R experts. We’ll continue to advance the big data and enterprise integration capabilities of Revolution R Enterprise. And we’ll continue to offer expert technical training and consulting services.

 

is correct. Moreover, the financial backing of Microsoft will likely provide a strong tail wind to help several of the initiatives that Revolution Analytics started move forward more rapidly.

 

As part of Alteryx's partnership with them, I've had the opportunity and pleasure to interact with many people at Revolution Analytics, and I wish them the best of luck in the next part of their journey.

Dan Putler
Chief Scientist

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.

Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia's Sauder School of Business and Purdue University’s Krannert School of Management.

Comments
Asteroid

Hi Dan!

 

Could you provide an update on the Revolution Analytics and Alteryx relationship now that we are 9 months out? I'm looking for specific RA tools for larger datasets that my 8GB/8core computer can't handle (~800,000 rows).

 

Kai 🙂

Alteryx
Alteryx

Hi Kai,

 

We are close to launching an update to our predictive tools that will re-enable the ability to use Revolution R Enterprise with Alteryx. The dust needed to settle a little bit after the Microsoft acquisition of Revolution, as they worked through packaging issues, so integration wasn't possible out of the box with RRE 7.4.1 and Alteryx 10.0. Those issues have now been worked out, so expect an announcement shortly on the predictive update.

 

Dan

Atom

Hello, I am newcomer on Alteryx and Revolution R. The couple looks stable today, after a couple of months/year of marriage 🙂

I am positioning it as predictive analytics at marketing department in insurance company in Europe.

I have pragmatic questions: is the installation bundled or separate? is there additional cost for adding Revolution R to Alteryx?

 

Alteryx
Alteryx

@burnayj sorry for the slow reply, this is a somewhat older blog post, so it took a while to stumble onto your comment. Currently, they are not bundled together. The Alteryx predictive tools look for Revolution R Enterprise (soon to be Microsoft R Services) and know to work with it as Alteryx's R engine if it is available.

Alteryx Partner

@DrDan Greetings from Brazil.

 

I realize that Alteryx flows' speed considerably reduces when it reaches some "regular R" processing, probably due to R in-memory limitations...

 

So I tried to install Alteryx Predictive Tools for Revolution Analytics RRE 7.4.1 package, just as it is in Alteryx download page (http://downloads.alteryx.com/predictive.html)...

 

However, I could not find RRE 7.4.1, since it is now Microsoft R Open and not Revolution R Open anymore...

 

I searched for RRE 7.4.1, but now versions are newer, I found Microsoft R Open (MRO) 3.2.3 (seems to follow R version numbering) and also RRO 8.0.3, I think it was the last edition created by Revolution Analytics...

 

So, I'm confused...

 

How can I use some of these "Turbo R" with Alteryx? First of all, which of them should I install (RRO 8.0.3 [older] or MRO 3.2.3 [newer])? Must I install it prior to installing Alteryx? Can this Alteryx Predictive Tools for Revolution Analytics RRE 7.4.1 package recognize RRO 8.0.3 or MRO 3.2.3?

 

Can you please shed some light on it?

 

Thank you for your attention. Regards,

 

Bruno.

Alteryx
Alteryx

@Bruno_Pasquini, we will be supporting RRE 8.0 in the next Alteryx release. One issue will be getting RRE 8.0. Right now Microsoft is internally working out how they will handle licensing and distribution of the RRE 8.0 (at some point it will become Microsoft R Server, even running on a standalone workstation). When they get things settled, we will be in a better position to say what we will be doing going forward.

 

Dan

Asteroid
Good information, @DrDan.

Revolution seems to think this already integrates perfectly with Alteryx. I take it this is not necessarily the full skinny.

Kai 🙂
--
Kai R. Larsen
Principal, Human Behavior Project, http://www.theorizeit.org
Leeds School of Business
University of Colorado
995 Regent Dr.
Boulder, CO 80309
Alteryx
Alteryx

It does with RRE 7.4.1, and we do try to make updates available in a timely manner within the confines of our release schedule. In terms of things going forward, my guess is that things will nailed down sooner rather than later.

Asteroid

Thanks @DrDan,

 

the University tech folks have kindly installed RRE. How can I evaluate whether it is actually installed properly?  I thought there would be new tools showing up, but perhaps it just swaps out the code on some of the standard predictive tools Alteryx calls in R?  

 

 

Alteryx
Alteryx

@KaiLarsen: The only two tools that are added with RRE / Microsoft R Server support are the XDF Input and XDF Output tools. Many of the predictive tools look to see if the incoming data is from an XDF file, and makes use of Revo ScaleR capabilities if that is this case. This guide should help to get you started: http://downloads.alteryx.com/Documentation/Alteryx%20and%20Revolution%20Analytics%20Integration%20Gu...

 

Dan