Abstract
Traditional crime prediction models based on census data are limited, as they fail to capture the complexity and dynamics of human activity. With the rise of ubiquitous computing, there is the opportunity to improve such models with data that make for better proxies of human presence in cities. In this paper, we leverage large human mobility data to craft an extensive set of features for crime prediction, as informed by theories in criminology and urban studies. We employ averaging and boosting ensemble techniques from machine learning, to investigate their power in predicting yearly counts for different types of crimes occurring in New York City at census tract level. Our study shows that spatial and spatio-temporal features derived from Foursquare venues and checkins, subway rides, and taxi rides, improve the baseline models relying on census and POI data. The proposed models achieve absolute R^{2} metrics of up to 65% (on a geographical out-of-sample test set) and up to 89% (on a temporal out-of-sample test set). This proves that, next to the residential population of an area, the ambient population there is strongly predictive of the area’s crime levels. We deep-dive into the main crime categories, and find that the predictive gain of the human dynamics features varies across crime types: such features bring the biggest boost in case of grand larcenies, whereas assaults are already well predicted by the census features. Furthermore, we identify and discuss top predictive features for the main crime categories. These results offer valuable insights for those responsible for urban policy or law enforcement.
Highlights
Random forests are very popular in practice, as they are easy to use, robust, and yield good performance
The Extra-Trees add a third level of randomization in comparison to the random forests, in that the split tests at each node of the decision trees are random, next to the chosen sub-sets of samples and features
They yield sometimes better performance thanks to the introduced smoothing effect, and remove computational burdens linked to the determination of optimal cut-points in random forests
Summary
Crime analysis has already confirmed that crimes are unequally distributed in time and space [1]. Knowing when and where crime is more likely to occur can help various actors engaged in crime reduction: urban planners to design safer cities [3] and police forces to better direct their patrols [4]. Criminological studies have focused solely on socio-demographic attributes as factors correlating with victimization and have noticed that specific groups of people tend to have lifestyles that exposed them to higher risk of victimization compared to. Under the umbrella of the Social Disorganization Theory, a series of criminological studies have explained crime as a product of the ecological attributes of the neighborhood: ethnicity, income level, and residential stability [6, 7]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.