An implementation of cloud-based platform with R packages for spatiotemporal analysis of air pollution

Chao-Tung Yang,Ben-Shen Lou,Yu-Wei Chan,Jung-Chun Liu

doi:10.1007/s11227-017-2189-1

Abstract

Recently, the R package has become a popular tool for big data analysis due to its several matured software packages for the data analysis and visualization, including the analysis of air pollution. The air pollution problem is of increasing global concern as it has greatly impacts on the environment and human health. With the rapid development of IoT and the increase in the accuracy of geographical information collected by sensors, a huge amount of air pollution data were generated. Thus, it is difficult to analyze the air pollution data in a single machine environment effectively and reliably due to its inherent characteristic of memory design. In this work, we construct a distributed computing environment based on both the softwares of RHadoop and SparkR for performing the analysis and visualization of air pollution with the R more reliably and effectively. In the work, we firstly use the sensors, called EdiGreen AirBox to collect the air pollution data in Taichung, Taiwan. Then, we adopt the Inverse Distance Weighting method to transform the sensors’ data into the density map. Finally, the experimental results show the accuracy of the short-term prediction results of PM2.5 by using the ARIMA model. In addition, the verification with respect to the prediction accuracy with the MAPE method is also presented in the experimental results.

Full Text