Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Data Science

Machine learning & data science for beginners and experts alike.
SeanL
Alteryx Alumni (Retired)

The network analysis tool provides a way to visually interact with all kinds of data (in Alteryx version 10.0 check out Help -> Sample Workflows -> Predictive Analytics -> 24 Network Analysis Tool). In order to execute a network anaysis, you must provide a very specific form of input data -  a list of network nodes and edges. How do we get useful nodes and edges?

 

I set out to explore this question with one of my favorite datasets from OpenBeerDB.com:

 

Name Category Style IBU (bitterness) SRM (color) ABV (alcohol content)
Coffee Stout North American Ale American-Style Stout 30 47 5
Oktoberfest German Lager German-Style Oktoberfest 93 - 7
... ... ... ... ... ...

 

My goal: To identify products (in this case beers) that are similar based on attributes (bitterness, color, alcohol content). 

 

This type of analysis is similar to cluster analysis - so as a baseline I started by performing a standard k-means clustering analysis, which grouped my 18 beers into 3 groups:

 

clusterAnalysisSoln.png

You can see dark stouts getting identified together (group 2) and the bitter and alcoholic IPAs are also classified as similar (group 1). 

 

However, this analysis lacks two things:

 

    1. An interactive way to explore the results.
    2. Metrics to understand how specific beers relate to one another and to the network of beers as a whole.

To further the analysis I created a dataset where each (product) beer was a node. Here is the key: I idenitifed links between the nodes using the nearest neighbour tool. The nearest neighbor tool identifies the closest k neighbors based on euclidean distance between points (beers) in an n-dimensional space (what I will call "beer space"  - a 3 dimensional space defined by bitterness, color, and alcohol content):

 

network_workflow.png 

 

A little bit of formatting and a formula to translate distance into "closeness" and the result is a network linking similar products (run the attached workflow in Alteryx v10.0 with predictive tools to fully experience the interactive exploration):

 

resulting_network.png

 

 

Transaction Data and Market Basket Analysis 

 

 Finally,  creating this type of product network is similar to market basket analysis (think a netflix-like recommendation engine for beer). To explore this idea further, I took transaction data from a grocery store and ran it through a market basket analysis looking for association rules (eg transactions with baby formula and diapers might suggest purchasing baby formula increases the chance of purchasing diapers). 

 

This is standard MB Analysis, and the static report created by the MB Inspect Tool includes a network diagram. To add in an interactive visualization with more detailed network statistics all I did was manipulate the data output:

mb_analysis.png

 

mb_network_result.png

 

Hopefully now you can create meaningful networks to better understand the relationships between products.

Comments