TPS 691: Methods of measurement, design and data analysis, Exhibition Hall, Ground floor, August 28, 2019, 3:00 PM - 4:30 PM Background: A challenge in epidemiological studies of air pollution data is the issue of missing data. Air quality monitoring stations typically have some amount of missing data. A promising tool for spatiotemporal missing imputation of environmental data is Distributed Space-Time Expectation-Maximization (D-STEM) software. The D-STEM provides a general framework for concurrent modeling and missing imputation of air pollution data, which uses the prediction of a specified hierarchical space-time model in place of missing data in the iterative steps of Expectation-Maximization algorithm. Methods: We used the regression model to calibrate the prediction of the D-STEM software in each monitoring station, which considers the observed data as the response of the calibrating regression and the corresponding prediction of the D-STEM model as the only predictor. A simulation study that induced 10% extra missing data and repeated 50 times was used to compare the performance of missing data imputation algorithms. Moreover, two different underlying hierarchical space-time models were considered, which were (a) the simplest D-STEM model (a space-time model with just an intercept), and (b) an enriched model that benefited from a set of carefully selected predictors. These models were applied to data of fine particle matter, measured in 30 fixed monitoring stations in Tehran, Iran, where 48% of data was missing. To compare the competence of these methods in missing imputation, the Mean Absolute Percentage Error (MAPE) criterion was used. Results: The MAPEs of the D-STEM simplest model and the D-STEM enriched model were 26.1% and 25.3% in the original D-STEM spatiotemporal missing data imputation. The MAPEs of these two underlying models in the proposed calibration for the D-STEM spatiotemporal missing data imputation were 20.6% and 20.4%, respectively. Conclusions: The proposed calibration method substantially improved the spatiotemporal missing data imputation, which could benefit future exposure assessment and epidemiological studies of air pollutants.
Read full abstract