Abstract

Whether evaluating gridded population dataset estimates (e.g., WorldPop, LandScan) or household survey sample designs, a population census linked to residential locations are needed. Geolocated census microdata data, however, are almost never available and are thus best simulated. In this paper, we simulate a close-to-reality population of individuals nested in households geolocated to realistic building locations. Using the R simPop package and ArcGIS, multiple realizations of a geolocated synthetic population are derived from the Namibia 2011 census 20% microdata sample, Namibia census enumeration area boundaries, Namibia 2013 Demographic and Health Survey (DHS), and dozens of spatial covariates derived from publicly available datasets. Realistic household latitude-longitude coordinates are manually generated based on public satellite imagery. Simulated households are linked to latitude-longitude coordinates by identifying distinct household types with multivariate k-means analysis and modelling a probability surface for each household type using Random Forest machine learning methods. We simulate five realizations of a synthetic population in Namibia’s Oshikoto region, including demographic, socioeconomic, and outcome characteristics at the level of household, woman, and child. Comparison of variables in the synthetic population were made with 2011 census 20% sample and 2013 DHS data by primary sampling unit/enumeration area. We found that synthetic population variable distributions matched observed observations and followed expected spatial patterns. We outline a novel process to simulate a close-to-reality microdata census geolocated to realistic building locations in a low- or middle-income country setting to support spatial demographic research and survey methodological development while avoiding disclosure risk of individuals.

Highlights

  • The ideal resource to evaluate the accuracy of gridded population datasets and certain household survey methodologies would be a complete set of individual records from a population linked to location of residence, though this is generally not available

  • Various gridded population datasets have evaluated the accuracy of population counts at the geographic scale of input census data [3,4,5], and other analyses have evaluated whether cells were accurately classified as populated or not populated [6]; accuracy of population count per grid cell has not been evaluated because it requires a geo-located

  • Before releasing our simulated data, we closely reviewed papers about privacy of synthetic population data including a paper by Alfons and Templ (2010) who calculated disclosure risk of close-to-reality synthetic data generated with the simPop [R package] algorithm used in this analysis [53]

Read more

Summary

Introduction

The ideal resource to evaluate the accuracy of gridded population datasets and certain household survey methodologies would be a complete set of individual records from a population linked to location of residence, though this is generally not available. Various gridded population datasets have evaluated the accuracy of population counts at the geographic scale of input census data [3,4,5], and other analyses have evaluated whether cells were accurately classified as populated or not populated [6]; accuracy of population count per grid cell has not been evaluated because it requires a geo-located. In the realm of household surveys, evaluation of sample variability, measurement error, and missing values due to sample design requires a close-to-reality census of microdata to perform statistical simulations of repeated samples of households [7]. Synthetic population datasets have the advantage over actual census data that multiple scenarios can be generated to test outcomes in potential future populations

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.