Abstract

Social scientists routinely rely on methods of interpolation to adjust available data to their research needs. Spatial data from different sources often are based on different geographies that need to be reconciled, and some boundaries (e.g., administrative or political boundaries) change frequently. This study calls attention to the potential for substantial error in efforts to harmonize data to constant boundaries using standard approaches to areal and population interpolation. The case in point is census tract boundaries in the United States, which are redefined before every decennial census. Research on neighborhood effects and neighborhood change rely heavily on estimates of local area characteristics for a consistent area of time, for which they now routinely use estimates based on interpolation offered by sources such as the Neighborhood Change Data Base (NCDB) and Longitudinal Tract Data Base (LTDB). We identify a fundamental problem with how these estimates are created, and we reveal an alarming level of error in estimates of population characteristics in 2000 within 2010 boundaries. We do this by comparing estimates from one of these sources (the LTDB) to true values calculated by re-aggregating original 2000 census microdata to 2010 tract areas. We then demonstrate an alternative approach that allows the re-aggregated values to be publicly disclosed, using “differential privacy” (DP) methods to inject random noise that meets Census Bureau standards for protecting confidentiality of the raw data. We show that the DP estimates are considerably more accurate than the LTDB estimates based on interpolation, and we examine conditions under which interpolation is more susceptible to error. This study reveals cause for greater caution in the use of interpolated estimates from any source. Until and unless DP estimates can be publicly disclosed for a wide range of variables and years, research on neighborhood change should routinely examine data for signs of estimation error that may be substantial in a large share of tracts that experienced complex boundary changes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call