Abstract

The spatial decomposition of demographic data at a fine resolution is a classic and crucial problem in the field of geographical information science. The main objective of this study was to compare twelve well-known machine learning regression algorithms for the spatial decomposition of demographic data with multisource geospatial data. Grid search and cross-validation methods were used to ensure that the optimal model parameters were obtained. The results showed that all the global regression algorithms used in the study exhibited acceptable results, besides the ordinary least squares (OLS) algorithm. In addition, the regularization method and the subsetting method were both useful for alleviating overfitting in the OLS model, and the former was better than the latter. The more competitive performance of the nonlinear regression algorithms than the linear regression algorithms implies that the relationship between population density and influence factors is likely to be non-linear. Among the global regression algorithms used in the study, the best results were achieved by the k-nearest neighbors (KNN) regression algorithm. In addition, it was found that multi-sources geospatial data can improve the accuracy of spatial decomposition results significantly, and thus the proposed method in our study can be applied to the study of spatial decomposition in other areas.

Highlights

  • Information about fine-scale population distribution is essential in many areas, including urban planning and management [1], natural disaster response [2], infectious disease prevention and control [3], resource allocation, and environment protection [4]

  • Since the nonlinear regression model can deal better with the collinearity of independent variables and other problems that lead to overfitting, we suggest that when conducting research on the spatial decomposition of demographic data, priority should be given to using nonlinear regression models to improve the accuracy of results

  • This paper compares the use of twelve machine learning regression algorithms in gridded population mapping of Guangzhou city, China

Read more

Summary

Introduction

Information about fine-scale population distribution is essential in many areas, including urban planning and management [1], natural disaster response [2], infectious disease prevention and control [3], resource allocation, and environment protection [4]. Accurate population distribution data are fundamental for the achievement of urban sustainable development goals (SDGs) [5,6]. The census method is the main way to collect population data in varying countries. The spatial resolution and update frequency of census data are too low to meet the requirements of modern urban governance. Fine-scale and accurate population information is essential for exploring the relationship between urban residents and the built environment [1]

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.