Accurate exposure assessment is important for conducting PM10-2.5-related epidemiological studies, which have been limited thus far. In this study, we aimed to develop an ensemble machine learning method to estimate PM10-2.5 concentrations in mainland China during 2013-2020. The study was conducted in two stages. In the first stage, we developed two methods: the indirect method refers to developing models for PM2.5 and PM10 separately and subsequently calculating PM10-2.5 as the difference between them; and the direct method refers to establishing a model between PM10-2.5 measurements and relevant predictors directly. In the second stage, we employed an ensemble method by integrating predictions from both indirect and direct methods. Internal and external cross-validation (CV) were performed to validate the extrapolation capacity of models. The ensemble method demonstrated enhanced extrapolation accuracy in both internal and external CV compared to indirect and direct methods. The predictions produced by the ensemble method captured the spatiotemporal pattern of PM10-2.5, even in the sand and dust storm seasons. Our study introduces an ensemble strategy leveraging the strengths of both indirect and direct methods to estimate PM10-2.5 concentrations, which holds significant potential to support future epidemiological studies to address knowledge gaps in understanding the health effects of PM10-2.5.
Read full abstract