Maveryx Success Stories

Learn how Alteryx customers transform their organizations using data and analytics.
STORIES WANTED

Showcase your achievements in the Maveryx Community by submitting a Success Story now!

SUBMISSION INSTRUCTIONS

Crime Mapping – What If You Could Predict the Future?

callumstevens
5 - Atom
2018 Excellence Awards Entry: Crime Mapping – What If You Could Predict the Future?
 
Name: Kishan Dosa
Title: Senior Business Intelligence Consultant
Company: Climber
Collaborators: Kishan Dosa, Callum Stevens, Nathanael Norris, Goncalo Pereira
 
climber.png
 
 
 
Overview of Use Case:

The objective of the project was to use a series of Alteryx workflows and predictive models to generate predicted crime figures for metropolitan areas in England over the next three years. Additionally, we wanted to test the validity of our forecasts using data for the first three months of 2018.

 

Describe the business challenge or problem you needed to solve:

Climber recently became an Alteryx Preferred Partner and was keen to demonstrate its Alteryx abilities to clients. To do this, we looked at publicly available datasets to work with; which led us to look at the Crime Statistics for England and Wales. We thought if we could predict crime rates for the next few years, the information could be used to maximise spend on staffing, how best to allocate resources in specific time frames, and identify groups of areas to understand why that level of crime occurs in the first place. Crime today is a sensitive topic – it’s on the rise and we need to understand the implications to best understand how to deal with it.

 

Describe your working solution:
 

We loaded crime statistics datasets from the previous three years and filtered out unnecessary fields and rows. We manipulated the LSOA (Lower Layer Super Output Areas) field to give us the local area names, which we then mapped to wide area names (using dataset Local Authority District to Region Lookup in England). Crime categories were reclassified into higher level groups, for consistency due to variations in recent years.

 

crime-type-mappings.png

  

Time Series Forecasting: We performed time series forecasting to predict crime rates on a test dataset (January-March 2018). We used “TS Model Factory” and “TS Forecast Factory” tools from Alteryx Gallery to run ARIMA (Auto -Regressive Integrated Moving Average) and ETS (Exponential Smoothing) models.

 

Validation: We then validated the predictions against the real crime numbers for January-March 2018.

 

2.png

 

The validation results show errors across the different regions:

 

validation-results.png 

 Error against our “ETS” model were shown to be 12.2%, suggesting our forecasts were 87.8% accurate. Error against the “ARIMA” model was shown to be 12.1%, suggesting forecasts were 87.9% accurate. Since we found that the “ARIMA” model performed slightly better than the“ETS” model, so we decided that we would use that data for our analysis.

 

Enhancing the data: We were interested in performing cluster analysis to see the impact that deprivation has on crime levels through combining the dataset ‘Indices of Deprivation 2015’.  The data contains the seven relative measures of deprivation for small areas (Lower-layer Super Output Areas) across England.

 

We took the indicators for each domain, excluding crime, as that’s what we were forecasting and used it for our clustering analysis. To do this effectively, we needed to find out which of these factors were more likely to impact crime and exclude those that didn’t. The Association Analysis Tool produced a Correlation Matrix:

4.png

Based on the correlation results from the matrix, we had five fields which had a high confidence level, which we reduced further using several techniques to select those that are only likely to increase the accuracy of predictions. We then applied K-Centroids Diagnostic tool to determine the optimal number of clusters, and then K-Centroids Cluster Analysis tool to assign each area to a cluster.

 

5.png

 

We exported the data from Alteryx and loaded in all the relevant files into Qlik Sense for visualisations, including the KML files for the maps ([3],[4]).


6.png
7.png
 

 

Describe the benefits you have achieved:
 

Using Alteryx allowed us to combine vast amounts of unstructured public data from multiple data sources, of which we easily prepared it for advanced analytics. We utilised advance techniques to produce meaningful insights so that we could understand and be ready for what is highly likely to happen – the key to a stable foundation in delivering valuable, actionable insight to ensure high quality decision making. We predicted crime rates with a higher degree of accuracy and demonstrated how predictive analytics can be applied to real-world scenarios. We showcased our results using Qlik Sense, a highly accessible tool that allows you to understand the story that we are telling. Our findings were presented to Police Force staff and Press in a webinar, and overwhelmingly positive feedback was received.

 

Related Resources

YouTube: Watch the replay of the webinar demonstration

 

Download our technical brief for step-by-step instructions

 

 

Data Sources:

 

[1] All crime data

https://data.police.uk/data/

 

[2] Local Authority District to Region (December 2016) Lookup in England

http://geoportal.statistics.gov.uk/datasets/local-authority-district-to-region-december-2016-lookup-...

 

[3] KML for areas - English Districts, UAs and London Boroughs, 2011https://borders.ukdataservice.ac.uk/easy_download_data.html?data=England_lad_2011

[4] KML for regions - Regions (December 2017) Ultra Generalised Clipped Boundaries in England

http://geoportal.statistics.gov.uk/datasets/regions-december-2017-ultra-generalised-clipped-boundari...

 

[5] Demographics data - Lower Super Output Area Mid-Year Population Estimates

https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datas...

 

[6] English indices of deprivation 2015

https://www.gov.uk/government/statistics/english-indices-of-deprivation-2015

Comments
epark88
5 - Atom

Great work from the team!

Seem
6 - Meteoroid

This is a great example for hands-on on predictive modelling. Thank you for sharing this. While going through the case study I faced some challenges. The objective set out for 2nd part of the problem was to see the impact deprivation had on crimes. Given that then, crime is our dependent variable and deprivation is our cause for that outcome (predictors). Again deprivation index was calculated based on the indicators underlying each of the six domains of deprivation. To study crimes and understand which indicators are more responsible for the crime, a correlation between these indicators and the crime was required to be explored. In your solution, I struggled to understand how you established this correlation between crime and the indicators of deprivation. Which were the indicators you finally considered? Was your cluster analysis based on these indicators of deprivation? If not, what was your objective for clustering and which variables were used as predictors? Thanks again and looking forward to it.

 

Regards,

Seema Sutradhar