Abstract

Recently, the R package has become a popular tool for big data analysis due to its several matured software packages for the data analysis and visualization, including the analysis of air pollution. The air pollution problem is of increasing global concern as it has greatly impacts on the environment and human health. With the rapid development of IoT and the increase in the accuracy of geographical information collected by sensors, a huge amount of air pollution data were generated. Thus, it is difficult to analyze the air pollution data in a single machine environment effectively and reliably due to its inherent characteristic of memory design. In this work, we construct a distributed computing environment based on both the softwares of RHadoop and SparkR for performing the analysis and visualization of air pollution with the R more reliably and effectively. In the work, we firstly use the sensors, called EdiGreen AirBox to collect the air pollution data in Taichung, Taiwan. Then, we adopt the Inverse Distance Weighting method to transform the sensors’ data into the density map. Finally, the experimental results show the accuracy of the short-term prediction results of PM2.5 by using the ARIMA model. In addition, the verification with respect to the prediction accuracy with the MAPE method is also presented in the experimental results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.