Design of a Spark Big Data Framework for PM2.5 Air Pollution Forecasting.

Dong-Her Shih,Ly Sy Phu Nguyen,Ting-Wei Wu,Wen-Ting You,Thi Hien To

doi:10.3390/ijerph18137087

Abstract

In recent years, with rapid economic development, air pollution has become extremely serious, causing many negative effects on health, environment and medical costs. PM2.5 is one of the main components of air pollution. Therefore, it is necessary to know the PM2.5 air quality in advance for health. Many studies on air quality are based on the government’s official air quality monitoring stations, which cannot be widely deployed due to high cost constraints. Furthermore, the update frequency of government monitoring stations is once an hour, and it is hard to capture short-term PM2.5 concentration peaks with little warning. Nevertheless, dealing with short-term data with many stations, the volume of data is huge and is calculated, analyzed and predicted in a complex way. This alleviates the high computational requirements of the original predictor, thus making Spark suitable for the considered problem. This study proposes a PM2.5 instant prediction architecture based on the Spark big data framework to handle the huge data from the LASS community. The Spark big data framework proposed in this study is divided into three modules. It collects real time PM2.5 data and performs ensemble learning through three machine learning algorithms (Linear Regression, Random Forest, Gradient Boosting Decision Tree) to predict the PM2.5 concentration value in the next 30 to 180 min with accompanying visualization graph. The experimental results show that our proposed Spark big data ensemble prediction model in next 30-min prediction has the best performance (R2 up to 0.96), and the ensemble model has better performance than any single machine learning model. Taiwan has been suffering from a situation of relatively poor air pollution quality for a long time. Air pollutant monitoring data from LASS community can provide a wide broader monitoring, however the data is large and difficult to integrate or analyze. The proposed Spark big data framework system can provide short-term PM2.5 forecasts and help the decision-maker to take proper action immediately.

Highlights

In recent years, with rapid economic development, air pollution has become increasingly serious, causing many negative effects on health, environment and medical costs
This study aims to use the PM2.5 sensor data provided by the open source community Location Aware Sensing System (LASS), and use the Spark big data computing framework and machine learning algorithms to build a real-time prediction model, perform real-time prediction of PM2.5 concentration value in order to achieve the purpose of PM2.5 early warning and air pollution monitoring
The effectiveness evaluation the four algorithms in whether of theensemble ensemble performance is higher than theresults single of regression, three algoPM2.5 concentration y(t + 1) value prediction from training data are shown in Table 5 berithms of linear regression, random forest regression, gradient boost, and the integration low

Summary

Introduction

With rapid economic development, air pollution has become increasingly serious, causing many negative effects on health, environment and medical costs. The. World Health Organization’s (WHO) report mentions for about three-quarters of the world’s population, the air pollution concentration values of living environments exceeds those specified by the WHO, and indoor and outdoor air pollution causes about 7 million premature deaths every year [1]. Air pollution can cause many diseases and negatively affect human health. Martinelli, Olivieri and Girelli [2] have pointed out that exposure to fine suspended particulates (PM2.5 ) can lead to an increase in the incidence of cardiovascular diseases. IARC [3] mentioned that exposure to outdoor air pollution can cause lung cancer and increase the risk of bladder cancer and breast cancer.

Objectives

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International journal of environmental research and public health	Publication Date: Jul 2, 2021
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Design of a Spark Big Data Framework for PM2.5 Air Pollution Forecasting.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International journal of environmental research and public health

Lead the way for us

Similar Papers

Retrieving soil moisture from grape growing areas using multi-feature and stacking-based ensemble learning modeling
Shiyu Tao ... Bhaskar Shrestha
Computers and Electronics in Agriculture | VOL. 204
Shiyu Tao, et. al.Shiyu Tao ... Bhaskar Shrestha
05 Dec 2022
Computers and Electronics in Agriculture | VOL. 204

Comparative analysis of thermal preference prediction performance in different conditions using ensemble learning models based on ASHRAE Comfort Database II
Yan Bai ... Kai Liu
Building and Environment | VOL. 223
Yan Bai, et. al.Yan Bai ... Kai Liu
01 Sep 2022
Building and Environment | VOL. 223

Model of Storm Surge Maximum Water Level Increase in a Coastal Area Using Ensemble Machine Learning and Explicable Algorithm
Kun Sun ... Jiayi Pan
Earth and Space Science | VOL. 10
Kun Sun, et. al.Kun Sun ... Jiayi Pan
01 Dec 2023
Earth and Space Science | VOL. 10

Do Many Models Make Light Work? Evaluating Ensemble Solutions for Improved Rumor Detection
Younghwan Kim ... Huy Kang Kim
IEEE Access | VOL. 8
Younghwan Kim, et. al.Younghwan Kim ... Huy Kang Kim
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Design of a Spark Big Data Framework for PM2.5 Air Pollution Forecasting.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International journal of environmental research and public health