Abstract

We test three methods for ozone prediction in the El Paso (ELP) and Houston-Galveston-Brazoria (HGB) regions of Texas from 2005–2019: (1) a Generalized Additive Model (GAMs) approach; (2) a GAM approach with the addition of the Synthetic Minority Over-sampling TEchnique (SMOTE) and (3) a tail dependence modeling approach based in extreme value theory (EVT). We also compare the feature selection capabilities of the tail dependence approach to other feature selection methods. We find that the GAM+SMOTE model outperformed the GAM-only model when predicting ozone values for the root mean square error metric, particularly with regard to the above-threshold ozone values, which may be of particularly useful for extreme ozone event prediction. In addition, we find that the improvement of above-threshold MDA8 O3 prediction for the GAM+SMOTE method tends to come at the cost of below-threshold prediction, which is particularly important if MDA8 O3 trends are of interest. We also find that the tail dependence approach is capable of predicting extreme ozone events, but algorithmic stability and configuration complexity can make this approach difficult to operationalize on a broad scale and that the selection of the threshold needs to be carefully considered. Finally, the feature selection via the tail dependence method performs comparably to other forms of machine learning-based feature selection and we find that there are multiple parameter sets that can predict MDA8 O3 with equal success.

Highlights

  • Publisher: Taiwan Association for Aerosol Research ISSN: 1680-8584 print ISSN: 2071-1409 onlineCopyright: The Author(s)

  • Overall, compared to the generalized additive models (GAMs)-only model the GAM+Synthetic Minority Oversampling TEchnique (SMOTE) model: (1) did not substantially change the R2 values; (2) had varying impacts on the root mean square error (RMSE) when all test samples are included (RMSE_All); and (3) consistently reduced the RMSE for above-threshold testing samples (RMSE_Highest) at the expense of the lowest ozone values. From this we conclude that the SMOTE procedure improves the ability of the GAM-only regression to predict the above-threshold ozone values, but this improvement comes at the expense of the below-threshold values

  • We find that the GAM+SMOTE method consistently outperforms the GAM-only method when looking at the RMSE values, but not the R2 values

Read more

Summary

Introduction

Publisher: Taiwan Association for Aerosol Research ISSN: 1680-8584 print ISSN: 2071-1409 onlineCopyright: The Author(s). Surface ozone concentrations and their related detrimental health effects (Brunekreef et al, 2002) have been decreasing throughout the United States (U.S.) (Cooper et al, 2012; Fleming et al, 2018) due primarily to the reduction of ozone precursors including nitrogen oxides (NOx = NO + NO2) and carbon monoxide (CO) (Granier et al, 2011). The Environmental Protection Agency (EPA) has prioritized reductions in high

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call