Abstract

In many countries worldwide, effectively estimating AQI values and levels is essential for better monitoring the air pollution around the living area. This problem has become one of the interesting research subjects for many years, and there are many applications developed for personal usages. In this work, we aim to investigate a multi-source machine learning approach to approximate the local AQI scores at users’ location in a big city. We conduct different experiments on three primary data sets: SEPHLA-MediaEval 2019, MNR-Air-HCM, and MNR-HCM, collected in Ho Chi Minh City (Vietnam) and Fukuoka city (Japan). From the data sets provided, we extract different types of useful attributes for the problem: the timestamp information, the geographical data, sensor data (humidity and temperature), users’ emotion tags (such as greenness, calmness, etc.), the semantic features from images captured by users as well as the public weather data (including temperature, dew point, humidity, wind speed, and pressure) of the related cities. After that, we compare five distinct machine learning models for estimating the local AQI score and level, including Support Vector Machine [1], Random Forest [2], Extreme Gradient Boosting [3], LightGBM [4] and CatBoost [5]. We use RMSE, MAE, and R2 for measuring the performance of these approaches. The experimental results show that using random forest with sensor data, combined with public weather data, the results in AQI values regression and AQI ranks prediction can be the highest in many cases.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.