Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
BenMoss
ACE Emeritus
ACE Emeritus

Having in-depth demographic information is often seen as the holy grail in the eyes of data analysts and their business stakeholders, and there is a good reason why: the context that can be added to your analytics, as a result, is hugely advantageous.

 

Without demographic information our analysis may be incomplete or even worse, inaccurate.

 

I will walk through an example of how we can ‘loosely’ (there are some assumptions) map population data into your data to solve this issue.

 

I will be using data from Transport for London (TFL), to understand the potential footfall for each tube station in London; the dataset will provide us with the latitude and longitude for each of these stations.

 

The purpose of overlaying population data in this example would be to help understand the potential footfall of each station along with the demographic characteristics of that population, such as their age and gender. This may allow us to provide better facilities to our users; or from a commercial perspective, allow our marketing team to better understand the potential target markets at each station for advertisement campaigns (advertisement gave TFL roughly £140 million of revenue in 2017).

 

tube.jpg


In order to maximise the accuracy of our outputs, we should look to obtain demographic information at the lowest level of detail we can find, and, the newest information we can find.

 

In the UK, the Office of National Statistics release mid-year population estimates, broken down by age and gender at the level of Lower Super Output Areas (commonly referred to as LSOAs). In the UK, LSOAs are geographical areas that contain some average of 1000 residents and 650 households, whilst other characteristics such as social homogeneity also play a role in defining their boundaries.

 

Much of the UK’s data is collected at this level as well, such as the indices of deprivation.

 

In the US, census data can be collected from The United States Census Bureau.


When I say ‘loose method’ what exactly do I mean? The process is simple:

 

  1. Take the station trade area and spatially match it against our population area
  2. For each matched population area, create an intersect object between itself and the trade area
  3. Identify the size of the original population area
  4. Identify the size of the associated intersect polygon area
  5. From these two values, calculate the % of which the population area is contained within the trade area
  6. Multiply this value by the demographic variables we have
  7. Done!

Next question: How do we do this with Alteryx?

 

First thing's first, if we haven’t already, we need to generate some trade areas for which we can map our demographic information against.

 

In this case, I have created a 5-mile boundary around each store using the trade area tool. I have also checked the option to ‘Eliminate Overlap’ as I anticipate our customers will only ever travel to the closest store.

 

map.png

 

Next, we need to input our demographic information that we have deemed appropriate to map against our data. In this case, I have used the LSOA data mentioned earlier, with population numbers broken down by age and gender.

 

Once we have inputted both sources onto our canvas, we need to perform our spatial match of the two datasets. Here, we have configured the spatial match to return objects that ‘touched or intersected’ the second object.

 

config.png

 

For each of your trade areas, you will now have a list of the population areas that are, at least in part, within your trade areas.

 

results.png

 

Now that we know which population objects are linked to which trade areas alongside the actual spatial objects for each, and the intersect polygon for each match, we can work out the overlap rate of a population area onto a trade area.

 

To do this, I will use two spatial info tools, but you can also use a formula tool which supports spatial functions.

 

canvas.png

 

Once we have these values, it is just a case of identifying the demographic attributes we wish to bring in, joining these to this file, and multiplying it by the percentage overlap.


A complete workflow can be found here, and a sample visualisation that I have built with this data can be found here.

 

Ben

Comments