Analytics

News, events, thought leadership and more.
JCR
Alteryx Alumni (Retired)

Spark Summit logo

This week was Spark Summit 2014. An event that doubled in size from the previous summit held only six months ago. This says a lot about the attention and traction Apache Spark has gained in the industry. There was also an entire track was dedicated to applications built on Spark, which means that Spark is out of the labs, making its first steps in the enterprise world as a technology already capable of delivering business value.

 

Spark is an advanced in-memory data processing platform that includes SQL, Streaming, Machine Leaning and graph. This means advanced analytics and data transforms on incredible volumes of data, in-place at interactive speed. This also means that a single and simpler stack will be over time unifying the capabilities of new data platforms, based on a variety of data sources including HDFS, Amazon S3 or Cassandra.

At Alteryx we believe this is exciting news for our customers and we see it as a way to bring this incredible technology to empower data analysts with additional data platforms. As the volume and variety of data continues to grow exponentially, new data platforms such as Apache Spark provide the necessary compute framework to address data blending and advanced analytics at scale. This is the reason why this week we announced a partnership with Databricks to help bring the best of Spark to the market.

 

Databricks logo

While all Spark projects are exciting, two of them are of special interest to us for the data analyst:

 

Spark SQL is the fastest growing component inside of Spark. It brings the notion of schema to Spark, as well as the SQL programming language, which will keep extending towards SQL92. This has many implications, one of them being to bring together traditional RDBMS systems and new data formats from the Hadoop and NoSQL worlds. This opens incredible perspectives for addressing new data blending requirements, and we want to help make it easy for data analysts to be able to leverage this.

 

The second project is SparkR. Alteryx had already made the choice of R to deliver advanced analytics capabilities to our users with easy drag and drop R macros – which means no programming for the analyst. The SparkR project aims to provide direct access to Spark from R. For Alteryx users this means leveraging the power and scalability of Spark from their familiar language and tools. Continuing our support and commitment to the open community, Alteryx is partnering with Databricks to bring SparkR to release.

 

Through focusing on those two projects, along with our partner Databricks, we’ll soon bring to our customers extensive new capabilities to process data blending and advanced analytics flows on any type of data, at scale. This is all with the same intuitive workflow that our users enjoy — empowering them, the data analyst, to have deeper insights in hours, not the weeks typical of traditional approaches.