A good machine learning model would greatly contribute to an accurate crime prediction. Thus, researchers select advanced models more frequently than basic models. To find out whether advanced models have a prominent advantage, this study focuses shift from obtaining crime prediction to on comparing model performance between these two types of models on crime prediction. In this study, we aimed to predict burglary occurrence in Los Angeles City, and compared a basic model just using prior year burglary occurrence with advanced models including linear regressor and random forest regressor. In addition, American Community Survey data was used to provide neighborhood level socio-economic features. After finishing data preprocessing steps that regularize the dataset, recursive feature elimination was utilized to determine the final features and the parameters of the two advanced models. Finally, to find out the best fit model, three metrics were used to evaluate model performance: R squared, adjusted R squared and mean squared error. The results indicate that linear regressor is the most suitable model among three models applied in the study with a slightly smaller mean squared error than that of basic model, whereas random forest model performed worse than the basic model. With a much more complex learning steps, advanced models did not show prominent advantages, and further research to extend the current study were discussed.
Read full abstract