Analytics

GeorgeM · ‎06-26-2015

It was our pleasure this past week to participate in SparkSummit West and have Alteryx be included in the speaking line up. The incredible growth of SparkSummit has been breathtaking (doubling each of the past two years to now 2,000 attendees) and certainly reflected the energy of Spark as movement in the Big Data ecosystem.

Alteryx became really focused on Spark and specifically SparkR as some of our largest customers needed scale-out capabilities for their predictive analytics projects. While Hadoop offered the opportunity to take advantage of petabyte-scale data management, we quickly realized that MapReduce was not built to do interactive analytic workloads. Thus, it became apparent that RMR (R-MapReduce) was not likely to be useful for either compute scale-out or ease of use.

That’s where our relationship with the Spark community evolved quickly in the last 18 months. Though the initial work with Spark was focused on understanding the performance of the Spark SQL optimizer, we got excited in the potential of SparkR being a first-class citizen of Spark, while becoming more enterprise ready. This drove our decision to embrace Spark, SparkSQL, SparkR, the Dataframe API and now SparkML/MLlib.

As Patrick Wendell said, "The end-goal of a lower-level platform is to disappear, so people are able to focus on solving problems." We are delighted to see the power of a business-user driven experience like Alteryx converge with a power of what Spark as a general-purpose analytic computing platform. As platforms like Alteryx and Spark continue to proliferate; I see not only a new analytics stack emerge, but the underpinnings of how the next-gen analytic apps can be easily build and deployed at scale. My Day two fireside chat with Arsalan Tavakoli was an incredible opportunity to share these thoughts with the broader Spark community.

The afternoon session on SparkR led by Chris Freeman of Alteryx was a tour-de-force. He took the stage and demonstrated how both SparkR and the DataFrame API evolved these last few months: https://github.com/cafreeman/SparkR_DataFrame_Demo/blob/master/SS_DataFrame_Slides.pptx. The emergence of SparkR and the DataFrame API awoke another 2M+ users (in the R community) to see the power of Spark.

SparkSummit was an extremely well run event for 2000 converging at the San Francisco Hilton. Much credit goes to Kavitha Mariappan and the rest of the Databricks marketing team in running the show like clockwork. Special thanks to everyone else at Databricks and AMPLabs that were part of our Spark journey including Ion, Matei, Patrick, Ali, Shivram, Arsalan, Andy, and Scott.

For many of us at Alteryx and other organizations contributing to Spark; it is not just a movement, but a labor of love. We are looking forward to our follow-up work on Spark 1.5 including the integration of SparkR w/ SparkML/MLlib. In the meantime, check out my #Inspire15 keynote, where Ion joins me onstage to discuss the future of Spark.

Analytics

Spark – The Future of Analytical Computing