Abstract

High-resolution gridded population data are important for understanding and responding to many socioeconomic and environmental problems. Local estimates of the population allow officials and researchers to make a better local planning (e.g., optimizing public services and facilities). This study used a random forest algorithm, on the basis of remote sensing (i.e., satellite imagery) and social sensing data (i.e., point-of-interest and building footprint), to disaggregate census population data for the five municipal districts of Zhengzhou city, China, onto 100 × 100 m grid cells. We used a statistical tool to detect areas with an abnormal population density; e.g., areas containing many empty houses or houses rented by more people than allowed, and conducted field work to validate our findings. Results showed that some categories of points-of-interest, such as residential communities, parking lots, banks, and government buildings were the most important contributing elements in modeling the spatial distribution of the residential population in Zhengzhou City. The exclusion of areas with an abnormal population density from model training and dasymetric mapping increased the accuracy of population estimates in other areas with a more common population density. We compared our product with three widely used gridded population products: Worldpop, the Gridded Population of the World, and the 1-km Grid Population Dataset of China. The relative accuracy of our modeling approach was higher than that of those three products in the five municipal districts of Zhengzhou. This study demonstrated potential for the combination of remote and social sensing data to more accurately estimate the population density in urban areas, with minimum disturbance from the abnormal population density.

Highlights

  • Up-to-date, spatially accurate population datasets are fundamental to many aspects of decision making and risk assessment, such as economic development, disaster response, and public health research [1,2,3,4]

  • There are some well-known global efforts that generate high-resolution gridded population data using these approaches, including Gridded Population of the World (GPW) [17], Global Rural-Urban Mapping Project (GRUMP) [18], LandScan Global [3], Global Human Settlement Population Grid datasets (GHS-POP) [19], and Worldpop [20]. Most of these products use a combination of various Remote Sensing (RS) data as ancillary data, including Land Use/Land-Cover (LULC), Nighttime Light (NTL), temperature, precipitation, etc., to transform traditional choropleth population maps into a continuous gridded population surface, with aggregated values re-distributed across regular spatial units [2,10,12,21]

  • The results showed that whether we exclude the area with an abnormal population density from the study area, the population datasets generated with Point of Interest (POI) and building footprint data achieved a better RSME and MAE than those road and point of interest variables; all variables)

Read more

Summary

Introduction

Up-to-date, spatially accurate population datasets are fundamental to many aspects of decision making and risk assessment, such as economic development, disaster response, and public health research [1,2,3,4]. There are some well-known global efforts that generate high-resolution gridded population data using these approaches, including Gridded Population of the World (GPW) [17], Global Rural-Urban Mapping Project (GRUMP) [18], LandScan Global [3], Global Human Settlement Population Grid datasets (GHS-POP) [19], and Worldpop [20] Most of these products use a combination of various Remote Sensing (RS) data as ancillary data, including Land Use/Land-Cover (LULC), Nighttime Light (NTL), temperature, precipitation, etc., to transform traditional choropleth population maps into a continuous gridded population surface, with aggregated values re-distributed across regular spatial units [2,10,12,21]. A high dimension (i.e., made up of many variables) is an important feature of these data, and advanced methods that can handle such high dimensional data should be adopted

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call