This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Having in-depth demographic information is often seen as the holy grail in the eyes of data analysts and their business stakeholders, and there is a good reason why: the context that can be added to your analytics, as a result, is hugely advantageous.
Without demographic information our analysis may be incomplete or even worse, inaccurate.
I will walk through an example of how we can ‘loosely’ (there are some assumptions) map population data into your data to solve this issue.
I will be using data from Transport for London (TFL), to understand the potential footfall for each tube station in London; the dataset will provide us with the latitude and longitude for each of these stations.
The purpose of overlaying population data in this example would be to help understand the potential footfall of each station along with the demographic characteristics of that population, such as their age and gender. This may allow us to provide better facilities to our users; or from a commercial perspective, allow our marketing team to better understand the potential target markets at each station for advertisement campaigns (advertisement gave TFL roughly £140 million of revenue in 2017).
In order to maximise the accuracy of our outputs, we should look to obtain demographic information at the lowest level of detail we can find, and, the newest information we can find.
In the UK, the Office of National Statistics release mid-year population estimates, broken down by age and gender at the level of Lower Super Output Areas (commonly referred to as LSOAs). In the UK, LSOAs are geographical areas that contain some average of 1000 residents and 650 households, whilst other characteristics such as social homogeneity also play a role in defining their boundaries.
Much of the UK’s data is collected at this level as well, such as the indices of deprivation.
When I say ‘loose method’ what exactly do I mean? The process is simple:
Take the station trade area and spatially match it against our population area
For each matched population area, create an intersect object between itself and the trade area
Identify the size of the original population area
Identify the size of the associated intersect polygon area
From these two values, calculate the % of which the population area is contained within the trade area
Multiply this value by the demographic variables we have
Next question: How do we do this with Alteryx?
First thing's first, if we haven’t already, we need to generate some trade areas for which we can map our demographic information against.
In this case, I have created a 5-mile boundary around each store using the trade area tool. I have also checked the option to ‘Eliminate Overlap’ as I anticipate our customers will only ever travel to the closest store.
Next, we need to input our demographic information that we have deemed appropriate to map against our data. In this case, I have used the LSOA data mentioned earlier, with population numbers broken down by age and gender.
Once we have inputted both sources onto our canvas, we need to perform our spatial match of the two datasets. Here, we have configured the spatial match to return objects that ‘touched or intersected’ the second object.
For each of your trade areas, you will now have a list of the population areas that are, at least in part, within your trade areas.
Now that we know which population objects are linked to which trade areas alongside the actual spatial objects for each, and the intersect polygon for each match, we can work out the overlap rate of a population area onto a trade area.