Abstract

Abstract. Time series data in practical applications always contain missing values due to sensor malfunction, network failure, outliers etc. In order to handle missing values in time series, as well as the lack of considering temporal properties in machine learning models, we propose a spatiotemporal prediction framework based on missing value processing algorithms and deep recurrent neural network (DRNN). By using missing tag and missing interval to represent time series patterns, we implement three different missing value fixing algorithms, which are further incorporated into deep neural network that consists of LSTM (Long Short-term Memory) layers and fully connected layers. Real-world air quality and meteorological datasets (Jingjinji area, China) are used for model training and testing. Deep feed forward neural networks (DFNN) and gradient boosting decision trees (GBDT) are trained as baseline models against the proposed DRNN. Performances of three missing value fixing algorithms, as well as different machine learning models are evaluated and analysed. Experiments show that the proposed DRNN framework outperforms both DFNN and GBDT, therefore validating the capacity of the proposed framework. Our results also provides useful insights for better understanding of different strategies that handle missing values.

Highlights

  • Air pollution remains a serious concern in developing countries such as China and India and has attracted much attention

  • The models that are implemented and evaluated can be categorized into three following groups: (a) Non-RNN Machine Learning Baselines: We evaluate gradient boosting decision trees (GBDT) (Gradient Boosting Decision Tree) which is widely used in both regression and classification problem, and outperforms many other models in generalization ability. (b) Non-RNN Deep Learning Baselines: We take deep feed forward neural networks which share the number of layers with the deep recurrent neural networks that we propose as baselines. (c) Proposed Deep Learning Methods: This is our proposed model based on LSTM

  • On top of three kinds of missing value fixing algorithms, we propose two deep neural networks based on LSTM (DRNN-1 & deep recurrent neural network (DRNN)-2)

Read more

Summary

Introduction

Air pollution remains a serious concern in developing countries such as China and India and has attracted much attention. Typical sources of air pollution include industrial emission and traffic emission, and the main pollutants are PM2.5, PM10, NO2, SO2, O3 etc. Among the pollutants PM2.5 has attracted immense attention. The correlation between health risk and the concentration of air pollutants have been studied (Stieb et al, 2008, Chen et al, 2013). Organizations and governments, such as the World Health Organization (WHO, 2006), the USA Environmental Protection Agency (Laden et al, 2000), Japan (Wakamatsu et al, 2013) have implemented policies to support air pollution countermeasures

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call