Analytics

News, events, thought leadership and more.
BrianD
Alteryx Alumni (Retired)

Two weeks ago we co-hosted a webinar with Hortonworks entitled “3 Key Steps of Hadoop-based Analytics.” John Kreisa (@marked_man) and I talked about how business users can understand Hadoop and get started using Hadoop-based analytics to analyze new kinds of data that was previously considered un-usable or too costly. The theme of the webinar was based on our joint whitepaper “The Business Analyst’s Guide to Hadoop.”

 

We had some really great questions at the end of the webinar, so we thought it would be helpful to  write up a couple of selected ones and post them here.

 

What are some of the Hadoop Analytics use cases in the communications industry?

The communications industry can be divided into wireless, wireline and cable segments. There are great analytics examples in all three segments. As an example, in the wireless segment we see use cases that:

 

  • Determine the wireless markets with the highest potential for your business model based on communications industry-specific business and residential spending projections
  • Define your 4G network deployment plans based on where your highest-value customers live and work
  • Acquire new customers with targeted marketing campaigns based on demographic data and competitive contract status
  • Decrease customer churn by taking all customer interaction with your service into account and identifying at-risk customers
  • Solve customer experience issues faster by proactively identifying potential network problems
  • Evaluate merger and acquisition targets and competitive threats to determine strategic impact to your embedded network
  • Identify optimal locations for retail locations, and assess the performance of each store and its products

 

You can find out more on our Communications page, or you can view this video by Mary Coffee of Agileyx Labs, an Alteryx customer.

Could you please elaborate on the geo-spatial capabilities of Hadoop: storage, raster, vector, limitations, speed?

 

Hadoop doesn’t natively have geo-spatial analytics built into it, however it can store any kind of geospatially referenced data, and when combined with the geo-spatial capabilities of Alteryx, it becomes formidable. Alteryx can take address data and geo-code it to latitude and longitude. It can map those locations and calculate drive times – not just radius distances but actual drive time distances – from customers to a retail location, for instance. Alteryx has specialized in these types of spatial analytics for over 10 years now, and can make these calculations very quickly.

 

What specific use cases do you have related with financial services?

Alteryx provides many use cases in the financial services industries. Do you need to isolate and highlight the sources of highest risk? Determine flood certification coverage? Assess the impact of a potential merger? Alteryx can handle any of these.

 

On the Alteryx Analytics Gallery, there is an analytics app that allows users to select two different businesses and produce a report to show the competitive impact a merger between the two businesses would have by calculating the HHI (Herfindahl-Hirschman Index). HHI is a commonly accepted measure of market concentration.

 

Generally we have seen Hadoop used for risk modeling and fraud identification, trade performance analytics, surveillance and fraud detection, customer risk analysis and real-time upsell, and cross sell offers.

 

How does one go about improving performance 100x? And, what are the limits to this (not all queries may be made this fast).

The effort to make Apache Hive 100x faster comes from a community initiative called the Stinger initiative and involves three phases of improvements. The Hive community is well along the path towards improving the performance and is now in the second phase of improvements with initial tests showing 50x speed improvements having already been achieved.

 

A blog post on the Stinger initiative lays out the details on the Hortonworks site:   http://hortonworks.com/blog/100x-faster-hive/

Additionally there was a session at the recent Hadoop Summit called “An In-depth Look at Putting the Sting in Hive” which has the latest update on the project. The slides and video of the session will be posted shortly.

 

What are the most common new data types optimized by Hadoop?

There are six types of data we see typically being processed in Hadoop. These are processed individually or in combination with traditional structured information and are enabling new kinds of analytic applications to be built.

 

  1. Sentiment
    Understand how your customers feel about your brand and products – right now
  2. Clickstream
    Capture and analyze website visitors’ data trails and optimize your website
  3. Sensor/Machine
    Discover patterns in data streaming from remote sensors and machines
  4. Geographic
    Analyze location-based data to manage operations where they occur
  5. Server Logs
    Research logs to diagnose process failures and prevent security breaches
  6. Unstructured (txt, video, pictures, etc..)
    Understand patterns in text across millions of web pages, emails, and documents

 

Click here to view the webinar. To learn more about Alteryx, Hortonworks and Hadoop-based analytics, visit http://www.alteryx.com/hortonworks

 

Brian Dirking

Director of Product Marketing.