Hi all,
This kind of analysis is new to me so any help would be appreciated.
I have demographic data for the trade areas of 8 different store sites. The format for this is similar to:
Site | Category | Demographic Metric | No. Of People | Area % | Base % | Index | Site Sales |
A | Gender | Female | 4000 | 40 | 30 | 108 | £10,000 |
A | Gender | Male | 6000 | 60 | 70 | 85 | £10,000 |
A | Vehicles | 0 cars | 9000 | 90 | 80 | 110 | £10,000 |
A | Vehicles | 1 car | 500 | 5 | 8 | 98 | £10,000 |
A | Vehicles | 2 cars | 300 | 3 | 6 | 98 | £10,000 |
A | Vehicles | 3 cars or more | 200 | 2 | 6 | 97 | £10,000 |
B | Gender | Female | 20000 | 20 | 30 | 80 | £12,000 |
etc
Base % is the average for the country as a whole. Index shows how much each site's demographic is over/under-represented vs the base (100 is equal to the base, >100 is over-representing, <100 is under). I have ~200 different demographic metrics for each of my 8 sites. Each of these current sites are perceived as being successful, with not much variance in sales.
I have shortlisted 12 new sites for consideration and have the same demographic data for those. I want to:
1. Work out how comparable my 12 new sites are in terms of demographics vs the 8 existing sites (and why). This will need to take into consideration that not all demographic metrics are as important i.e. it doesn't necessarily matter how many people in the trade areas of my new sites have 3 cars or above as this applies to relatively few people
2. Forecast sales for the 12 new sites, based on existing site sales
I'm not sure clustering is the way forward as the existing sites are all fairly similar. I could do with a test for similarity...any advice please?
Many thanks
Solved! Go to Solution.
Why not run a linear regression on your existing locations to determine which demographic properties are important to the performance of those locations? Using those results, it would provide guidance how to demographically compare the new locations and a model to use to score the potential locations.
I missed the sample size mentioned in the first post. You concerns are sensible as there are certainly obstacles in this analysis, but it sounds like you'll be able to keep your results appropriately positioned based on the analysis methodology.
Consider generating your demographic variables as separate quantitative (count of female population) and qualitative (percent of female population) measurements. Also, you could increase the sample for indexing by including locations of similar businesses, and bootstrapping existing locations for your regression.