Abstract

Background: Monitoring of environmental contaminants is a critical part of exposure sciences and epidemiological research. Missing data are often encountered when performing short-term monitoring (<24hr) of air pollutants with real-time monitors, especially in resource-limited areas. Approaches for handling consecutive periods of missing and incomplete data in this context remain unclear. Our aim is to evaluate existing imputation methods for handling missing data for real-time monitors operating for short durations.Methods: In a current field-study, real-time particulate monitors were placed outside of 20 households for 24-hours. Missing data was simulated at four consecutive periods of missingness (20%, 40%, 60%, 80%). Univariate (Mean, Median, Last Observation Carried Forward, Kalman Filter, Random, Markov) and multivariate time-series (Predictive Mean Matching, Row Mean Method) methods were used to impute missing concentrations, and performance was evaluated using five error metrics (Absolute Bias, Percent Absolute Error in Means, R2 Coefficient of Determination, Root Mean Square Error, Mean Absolute Error). Results: Univariate methods of Markov, random, and mean imputations performed best, yielding 24-hour mean concentrations with low error and high R2 values across all levels of missingness. When evaluating error metrics minute-by-minute, Kalman Filters, median, and Markov methods performed well at low levels of missingness (20-40%). However, at higher levels of missingness (60-80%), Markov, random, median, and mean imputation performed best on average. Multivariate imputation methods performed worst across all levels of missingness. Conclusion: Epidemiological studies often report pollutant concentration in relationship to their potential health effect by averaging minute or hourly concentrations over 24-hours. However, when more than 25% of data is missing, daily average pollutant concentrations cannot be reliably computed. Univariate imputation may provide a reasonable solution to addressing missing data for short-term monitoring of air pollutants. Further efforts are needed to evaluate imputation methods that are generalizable across a diverse range of study environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call