Abstract

Ground-level ozone is a pollutant that is harmful to urban populations, particularly in developing countries where it is present in significant quantities. It greatly increases the risk of heart and lung diseases and harms agricultural crops. This study hypothesized that, as a secondary pollutant, ground-level ozone is amenable to 24 h forecasting based on measurements of weather conditions and primary pollutants such as nitrogen oxides and volatile organic compounds. We developed software to analyze hourly records of 12 air pollutants and 5 weather variables over the course of one year in Delhi, India. To determine the best predictive model, eight machine learning algorithms were tuned, trained, tested, and compared using cross-validation with hourly data for a full year. The algorithms, ranked by R2 values, were XGBoost (0.61), Random Forest (0.61), K-Nearest Neighbor Regression (0.55), Support Vector Regression (0.48), Decision Trees (0.43), AdaBoost (0.39), and linear regression (0.39). When trained by separate seasons across five years, the predictive capabilities of all models increased, with a maximum R2 of 0.75 during winter. Bidirectional Long Short-Term Memory was the least accurate model for annual training, but had some of the best predictions for seasonal training. Out of five air quality index categories, the XGBoost model was able to predict the correct category 24 h in advance 90% of the time when trained with full-year data. Separated by season, winter is considerably more predictable (97.3%), followed by post-monsoon (92.8%), monsoon (90.3%), and summer (88.9%). These results show the importance of training machine learning methods with season-specific data sets and comparing a large number of methods for specific applications.

Highlights

  • According to the World Health Organization, air pollution is a leading cause of premature deaths, responsible for approximately 4.2 million deaths annually worldwide due to lung cancer, heart disease, respiratory diseases, and more [1]

  • (92.8%), monsoon (90.3%), and summer (88.9%). These results show the importance of training machine learning methods with season-specific data sets and comparing a large number of methods for specific applications

  • Unlike the primary pollutants that are directly emitted by human activities, such as carbon monoxide (CO), nitrous oxides (NOx), and volatile organic compounds (VOCs), secondary pollutants cannot be directly reduced or appropriately regulated

Read more

Summary

Introduction

According to the World Health Organization, air pollution is a leading cause of premature deaths, responsible for approximately 4.2 million deaths annually worldwide due to lung cancer, heart disease, respiratory diseases, and more [1] One of these harmful pollutants is tropospheric ozone (O3 ), or “ground-level ozone”, which is produced when nitrous oxides (NOx) and volatile organic compounds (VOCs) undergo chemical reactions with sunlight and heat. The combination of precursor pollutants and the influence of varying environmental conditions make ozone prediction difficult with traditional approaches; this is an excellent potential application for machine learning (ML) methods, which can provide both forecasting capabilities and deeper insights into the causes of high ozone levels This information can help regulatory agencies to limit emissions of NOx and VOCs during high-risk periods. Machine learning methods can improve the accuracy of predictive warning systems of pollutants so that residents may avoid outdoor activity, the elderly and those with respiratory problems

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.