The network analysis tool provides a way to visually interact with all kinds of data (in Alteryx version 10.0 check out Help -> Sample Workflows -> Predictive Analytics -> 24 Network Analysis Tool). In order to execute a network anaysis, you must provide a very specific form of input data - a list of network nodes and edges. How do we get useful nodes and edges?
I set out to explore this question with one of my favorite datasets from OpenBeerDB.com:
Name |
Category |
Style |
IBU (bitterness) |
SRM (color) |
ABV (alcohol content) |
Coffee Stout |
North American Ale |
American-Style Stout |
30 |
47 |
5 |
Oktoberfest |
German Lager |
German-Style Oktoberfest |
93 |
- |
7 |
... |
... |
... |
... |
... |
... |
My goal: To identify products (in this case beers) that are similar based on attributes (bitterness, color, alcohol content).
This type of analysis is similar to cluster analysis - so as a baseline I started by performing a standard k-means clustering analysis, which grouped my 18 beers into 3 groups:
You can see dark stouts getting identified together (group 2) and the bitter and alcoholic IPAs are also classified as similar (group 1).
However, this analysis lacks two things:
- An interactive way to explore the results.
- Metrics to understand how specific beers relate to one another and to the network of beers as a whole.
To further the analysis I created a dataset where each (product) beer was a node. Here is the key: I idenitifed links between the nodes using the nearest neighbour tool. The nearest neighbor tool identifies the closest k neighbors based on euclidean distance between points (beers) in an n-dimensional space (what I will call "beer space" - a 3 dimensional space defined by bitterness, color, and alcohol content):
A little bit of formatting and a formula to translate distance into "closeness" and the result is a network linking similar products (run the attached workflow in Alteryx v10.0 with predictive tools to fully experience the interactive exploration):
Transaction Data and Market Basket Analysis
Finally, creating this type of product network is similar to market basket analysis (think a netflix-like recommendation engine for beer). To explore this idea further, I took transaction data from a grocery store and ran it through a market basket analysis looking for association rules (eg transactions with baby formula and diapers might suggest purchasing baby formula increases the chance of purchasing diapers).
This is standard MB Analysis, and the static report created by the MB Inspect Tool includes a network diagram. To add in an interactive visualization with more detailed network statistics all I did was manipulate the data output:
Hopefully now you can create meaningful networks to better understand the relationships between products.