BackgroundStatistical and machine learning models are commonly used to estimate spatial and temporal variability in exposure to environmental stressors, supporting epidemiological studies. We aimed to compare the performances, strengths and limitations of six different algorithms in the retrospective spatiotemporal modeling of daily birch and grass pollen concentrations at a spatial resolution of 1 km across Switzerland. MethodsDaily birch and grass pollen concentrations were available from 14 measurement sites in Switzerland for 2000–2019. To develop the spatiotemporal models, we considered spatiotemporal, spatial and temporal predictors including meteorological factors, land-use, elevation, species distribution and Normalized Difference Vegetation Index (NDVI). We used six statistical and machine learning algorithms: LASSO, Ridge, Elastic net, Random forest, XGBoost and ANNs. We optimized model structures through feature selection and grid search techniques to obtain the best predictive performance. We used train-test split and cross-validation to avoid overfitting and overoptimistic performance indicators. We then combined these six models through multiple linear regression to develop an ensemble hybrid model. ResultsThe 5th-95th percentiles of birch and grass pollen concentrations were 0–151 and 0–105 grains/m3, respectively. The hybrid ensemble model achieved the best RMSE on the test dataset for both birch and grass pollen with 94.4 and 19.7 grains/m3, respectively.Nonlinear models (Random forest, XGBoost and ANNs) achieved lower test RMSE's than linear models (LASSO, Ridge, Elastic net) for both pollen types, with RMSE's ranging from 105.9 to 140.5 grains/m3 for birch and from 20.0 to 25.4 grains/m3 for grass pollen. The Random forest algorithm yielded the best spatial and temporal performance among the six evaluated modelling methods. The ensemble hybrid model outperformed the six linear and nonlinear algorithms. Country-wide pollen concentration, land use, weather, and NDVI were important predictors. ConclusionNonlinear algorithms outperformed linear models and accurately explained complex, nonlinear relationships between environmental factors and measured concentrations.
Read full abstract