Data Science

SeanL · ‎11-16-2015

The network analysis tool provides a way to visually interact with all kinds of data (in Alteryx version 10.0 check out Help -> Sample Workflows -> Predictive Analytics -> 24 Network Analysis Tool). In order to execute a network anaysis, you must provide a very specific form of input data - a list of network nodes and edges. How do we get useful nodes and edges?

I set out to explore this question with one of my favorite datasets from OpenBeerDB.com:

Name	Category	Style	IBU (bitterness)	SRM (color)	ABV (alcohol content)
Coffee Stout	North American Ale	American-Style Stout	30	47	5
Oktoberfest	German Lager	German-Style Oktoberfest	93	-	7
...	...	...	...	...	...

My goal: To identify products (in this case beers) that are similar based on attributes (bitterness, color, alcohol content).

This type of analysis is similar to cluster analysis - so as a baseline I started by performing a standard k-means clustering analysis, which grouped my 18 beers into 3 groups:

undefined

You can see dark stouts getting identified together (group 2) and the bitter and alcoholic IPAs are also classified as similar (group 1).

However, this analysis lacks two things:

An interactive way to explore the results.
Metrics to understand how specific beers relate to one another and to the network of beers as a whole.

To further the analysis I created a dataset where each (product) beer was a node. Here is the key: I idenitifed links between the nodes using the nearest neighbour tool. The nearest neighbor tool identifies the closest k neighbors based on euclidean distance between points (beers) in an n-dimensional space (what I will call "beer space" - a 3 dimensional space defined by bitterness, color, and alcohol content):

undefined

A little bit of formatting and a formula to translate distance into "closeness" and the result is a network linking similar products (run the attached workflow in Alteryx v10.0 with predictive tools to fully experience the interactive exploration):

undefined

Transaction Data and Market Basket Analysis

Finally, creating this type of product network is similar to market basket analysis (think a netflix-like recommendation engine for beer). To explore this idea further, I took transaction data from a grocery store and ran it through a market basket analysis looking for association rules (eg transactions with baby formula and diapers might suggest purchasing baby formula increases the chance of purchasing diapers).

This is standard MB Analysis, and the static report created by the MB Inspect Tool includes a network diagram. To add in an interactive visualization with more detailed network statistics all I did was manipulate the data output:

undefined

Hopefully now you can create meaningful networks to better understand the relationships between products.

Data Science

Dive into Network Analysis for Product Data (with Beer!)

Network Analysis Problem

Feature Deep Dive - Introduction to OAuth

Network Analysis Edge weight issue

Feature Deep Dive - Full SQL Pushdown

Resource Productivity