This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
To coincide with Strata + Hadoop World, we just released a short video on Big Data and Data Blending. This video is moderated by Ben Lorica, Chief Data Scientist, O'Reilly Media, and Program Director of Strata + Hadoop World. Ben interviews Amr Awadallah, Founder and CTO of Cloudera, and George Mathew, President and COO of Alteryx.
In this session Ben leads the discussion through the evolution of Hadoop, and how the evolution is broadening the access of Hadoop to a new cadre of business users. Users without coding skills are now able to access the rich data stored in Hadoop, and to also use the power of Hadoop to blend the data and create targeted datasets to answer specific questions, and to quickly iterate through questions and datasets to develop new insights.
These users are not waiting on others to build queries and return results – these data analysts are able to dive right in and get their answers immediately.
There are two technological changes that are driving this access for data analysts.
Moving from MapReduce to memory based interfaces such as Spark and Impala as a SQL optimizer
To quote George from this video:
“What’s incredible to see today is those individuals who were limited to Excel literally landing information on HDFS where they have structured, semi-structured, even unstructured data that might be literally put into the file system. Now more workloads are possible in Hadoop than ever before, so you can have a very solid SQL optimizer sitting directly on top of a Hadoop infrastructure; you can have the ability to stream information that you couldn’t necessarily do out of the box with MapReduce in its first instantiation. And this is where we see the analytical platforms of the future emerge – where more and more users can participate and be more democratized in terms of the use of data and analytics, even at the petabyte scale we’re seeing in the Hadoop world today.”
Enabling Business Users With Tools Like Alteryx
Amr picks up the second part by saying:
“Seven years ago, when Hadoop came out, the only options were MapReduce and SGF and yes, it was absolutely limited to the most technical people that knew how to program in Java. But now there’s a very rich ecosystem of tools like Alteryx that know how to speak with Hadoop, and some of these go directly against Spark or the MapReduce engines, and some of these go through OBDC and JDBC SQL engines. But we have this very rich set of tools now that allow normal business users to access the data in Hadoop. So yes, it’s absolutely happening – the move is happening as we speak."
What this means for teams that instantiate and manage Hadoop systems is more use of their systems, and their systems are driving more value as the business is able to unlock the insights that these systems hold.