Predicting and Preventing Crime: A Crime Prediction Model Using San Francisco Crime Data by Classification Techniques

Muzammil Khan,Yasser Alharbi,Azmat Ali,Gonzalo Farias

doi:10.1155/2022/4830411

Muzammil Khan, Yasser Alharbi + Show 2 more

Open Access

https://doi.org/10.1155/2022/4830411

Copy DOI

Journal: Complexity	Publication Date: Feb 25, 2022
Citations: 15	License type: CC BY 4.0

Affiliation: University of Swat, University of Ha'il, Wuhan University

Abstract

The crime is difficult to predict; it is random and possibly can occur anywhere at any time, which is a challenging issue for any society. The study proposes a crime prediction model by analyzing and comparing three known prediction classification algorithms: Naive Bayes, Random Forest, and Gradient Boosting Decision Tree. The model analyzes the top ten crimes to make predictions about different categories, which account for 97% of the incidents. These two significant crime classes, that is, violent and nonviolent, are created by merging multiple smaller classes of crimes. Exploratory data analysis (EDA) is performed to identify the patterns and understand the trends of crimes using a crime dataset. The accuracies of Naive Bayes, Random Forest, and Gradient Boosting Decision Tree techniques are 65.82%, 63.43%, and 98.5%, respectively, and the proposed model is further evaluated for precision and recall matrices. The results show that the Gradient Boosting Decision Tree prediction model is better than the other two techniques for predicting crime, based on historical data from a city. The analysis and prediction model can help the security agencies utilize the resources efficiently, anticipate the crime at a specific time, and serve society well.

Highlights

Data mining is the knowledge discovery process used to collect and analyze a large dataset and summarize it with helpful information
Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes’ theorem with strong independence assumptions between the features
Gradient Boosting Decision Trees is a robust machine learning technique used in predictive modeling due to its high prediction accuracy compared to other modeling techniques

Summary

Introduction

Data mining is the knowledge discovery process used to collect and analyze a large dataset and summarize it with helpful information. It is critical in different fields of science to serve analytical purposes and plays an essential role in human life and fields such as education, business, medicine, health, and science. Data mining is an attractive process of discovering a valid, understandable, helpful pattern and valuable information in large amounts of data [1]. San Francisco Crime Classification is an open-source dataset available for an online competition administrated by Kaggle Inc. e main task in the dataset is to predict the crime category based on a given set of geographical and time-based variables. San Francisco Crime Classification is an open-source dataset available for an online competition administrated by Kaggle Inc. e main task in the dataset is to predict the crime category based on a given set of geographical and time-based variables. e limited and constrained police resources prove insufficient to handle the city’s law and order issues

Methods

Results

Conclusion