Abstract

The relationship between crime patterns and associated variables has drawn a lot of attention. These variables play a critical role in crime prediction. While traditional regression models are capable of revealing the contribution of the variables, they are not optimal for crime prediction. In contrast, machine learning models are more effective for crime prediction, but most of them cannot estimate the contribution of each individual variable. This study aims to overcome this limitation by taking advantage of the interpretability of advanced machine learning models. Based on the routine activity theory and crime pattern theory, this study selects 17 variables for the crime prediction. The XGBoost algorithm is adopted to train the prediction model. A post-hoc interpretable method, Shapley additive explanation (SHAP), is used to discern the contribution of individual variables. A variable with a higher SHAP value has a higher contribution to the crime prediction model. In addition to the global model for the entire area, a local model is calibrated at each study unit, revealing the spatial variation of the variables' unique contributions. Among all 17 variables used in this model, the proportion of the non-local population and the ambient population aged 25–44 contribute more than other variables in predicting crime. The more the ambient population aged 25–44 in the area, the more the public thefts. Additionally, local SHAP values are mapped to demonstrate each variable's contribution to the crime prediction model across the study area. The results of the local models can help the police tackle the most important factors at each location, while the global model identifies the important factors for the entire region.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call