Two weeks ago we co-hosted a webinar with Hortonworks entitled “3 Key Steps of Hadoop-based Analytics.” John Kreisa (@marked_man) and I talked about how business users can understand Hadoop and get started using Hadoop-based analytics to analyze new kinds of data that was previously considered un-usable or too costly. The theme of the webinar was based on our joint whitepaper “The Business Analyst’s Guide to Hadoop.”
We had some really great questions at the end of the webinar, so we thought it would be helpful to write up a couple of selected ones and post them here.
What are some of the Hadoop Analytics use cases in the communications industry?
The communications industry can be divided into wireless, wireline and cable segments. There are great analytics examples in all three segments. As an example, in the wireless segment we see use cases that:
You can find out more on our Communications page, or you can view this video by Mary Coffee of Agileyx Labs, an Alteryx customer.
Could you please elaborate on the geo-spatial capabilities of Hadoop: storage, raster, vector, limitations, speed?
Hadoop doesn’t natively have geo-spatial analytics built into it, however it can store any kind of geospatially referenced data, and when combined with the geo-spatial capabilities of Alteryx, it becomes formidable. Alteryx can take address data and geo-code it to latitude and longitude. It can map those locations and calculate drive times – not just radius distances but actual drive time distances – from customers to a retail location, for instance. Alteryx has specialized in these types of spatial analytics for over 10 years now, and can make these calculations very quickly.
What specific use cases do you have related with financial services?
Alteryx provides many use cases in the financial services industries. Do you need to isolate and highlight the sources of highest risk? Determine flood certification coverage? Assess the impact of a potential merger? Alteryx can handle any of these.
On the Alteryx Analytics Gallery, there is an analytics app that allows users to select two different businesses and produce a report to show the competitive impact a merger between the two businesses would have by calculating the HHI (Herfindahl-Hirschman Index). HHI is a commonly accepted measure of market concentration.
Generally we have seen Hadoop used for risk modeling and fraud identification, trade performance analytics, surveillance and fraud detection, customer risk analysis and real-time upsell, and cross sell offers.
How does one go about improving performance 100x? And, what are the limits to this (not all queries may be made this fast).
The effort to make Apache Hive 100x faster comes from a community initiative called the Stinger initiative and involves three phases of improvements. The Hive community is well along the path towards improving the performance and is now in the second phase of improvements with initial tests showing 50x speed improvements having already been achieved.
A blog post on the Stinger initiative lays out the details on the Hortonworks site: http://hortonworks.com/blog/100x-faster-hive/
Additionally there was a session at the recent Hadoop Summit called “An In-depth Look at Putting the Sting in Hive” which has the latest update on the project. The slides and video of the session will be posted shortly.
What are the most common new data types optimized by Hadoop?
There are six types of data we see typically being processed in Hadoop. These are processed individually or in combination with traditional structured information and are enabling new kinds of analytic applications to be built.
Click here to view the webinar. To learn more about Alteryx, Hortonworks and Hadoop-based analytics, visit http://www.alteryx.com/hortonworks
Brian Dirking
Director of Product Marketing.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.