Geocoding Error, Spatial Uncertainty, and Implications for Exposure Assessment and Environmental Epidemiology.

Ellen J Kinnee,Jessie L C Shmool,Leah Schinasi,Jane E Clougherty,Fernando Holguin,Perry E Sheffield,Sheila Tripathy

doi:10.3390/ijerph17165845

Abstract

Although environmental epidemiology studies often rely on geocoding procedures in the process of assigning spatial exposure estimates, geocoding methods are not commonly reported, nor are consequent errors in exposure assignment explored. Geocoding methods differ in accuracy, however, and, given the increasing refinement of available exposure models for air pollution and other exposures, geocoding error may account for an increasingly larger proportion of exposure misclassification. We used residential addresses from a reasonably large, dense dataset of asthma emergency department visits from all New York City hospitals (n = 21,183; 26.9 addresses/km2), and geocoded each using three methods (Address Point, Street Segment, Parcel Centroid). We compared missingness and spatial patterning therein, quantified distance and directional errors, and quantified impacts on pollution exposure estimates and assignment to Census areas for sociodemographic characterization. Parcel Centroids had the highest overall missingness rate (38.1%, Address Point = 9.6%, Street Segment = 6.1%), and spatial clustering in missingness was significant for all methods, though its spatial patterns differed. Street Segment geocodes had the largest mean distance error (µ = 29.2 (SD = 26.2) m; vs. µ = 15.9 (SD = 17.7) m for Parcel Centroids), and the strongest spatial patterns therein. We found substantial over- and under-estimation of pollution exposures, with greater error for higher pollutant concentrations, but minimal impact on Census area assignment. Finally, we developed surfaces of spatial patterns in errors in order to identify locations in the study area where exposures may be over-/under-estimated. Our observations provide insights towards refining geocoding methods for epidemiology, and suggest methods for quantifying and interpreting geocoding error with respect to exposure misclassification, towards understanding potential impacts on health effect estimates.

Highlights

A growing number of population-based studies rely on geocoding (i.e., assignment of x and y coordinates) to assign spatial exposure estimates [1,2,3,4]
We examined the misclassification of estimated concentrations using Bland–Altman plots, which display the error in concentration estimates, as a function of the concentration itself (using the mean of concentrations derived from the reference (Address Point) and each alternative geocoding method) (x-axis) (Figure 5)
Our results demonstrate that multiple types of geocoding error, and spatial patterning therein, may lead to systematic missingness and/or over-/under-estimation of pollution exposures, with potential implications for health effect estimates, in dense urban areas where exposures vary at a fine spatial scale

Summary

Introduction

A growing number of population-based studies rely on geocoding (i.e., assignment of x and y coordinates (latitude and longitude)) to assign spatial exposure estimates [1,2,3,4]. Despite this tremendous reliance on geocoding methods, relatively few epidemiologic studies examine and report geocoding errors in substantive detail. Many studies in the geography literature have evaluated issues in geocoding, including missingness (i.e., unmatched addresses) and positional accuracy (i.e., accuracy of x,y assignment, in distance or direction) [11,12,13,14]. We found more geography studies of distance error (i.e., Euclidean distance displacement from reference point) [8,11,15,16] than of directional error (i.e., cardinal direction of displacement) [8], and only a few discussing or quantifying spatial clustering in error [17,18]

Objectives

Methods

Results

Discussion

Conclusion