Classification of Indoor Air Pollution Using Low-cost Sensors by Machine Learning

Andrii Antonenko,Oleksandr Zagaria,Viacheslav Boretskij

doi:10.5194/egusphere-egu23-14856

Abstract

Air pollution has become an integral part of modern life. The main source of air pollution can be considered combustion processes associated with energy-intensive corporate activities. Energy companies consume about one-third of the fuel produced and are a significant source of air pollution [1]. State and public air quality monitoring networks were created to monitor the situation. Public monitoring networks are cheaper and have more coverage than government ones. Although the state monitoring system shows more accurate data, an inexpensive network is sufficient to inform the public about the presence or absence of pollution (air quality). In order to inform the public, the idea arose to test the possibility of detecting types of pollution using data from cheap air quality monitoring sensors. In general, to use a cheap sensor for measurements, it must first be calibrated (corrected) by comparing its readings with a reference device. Various mathematical methods can be used for this. One of such method is neural network training, which has proven itself well for correcting PM particle readings due to relative humidity impact [2].The idea of using a neural network to improve data quality is not new, but it is quite promising, as the authors showed in [3]. The main problem to implement this method is connected with a reliable dataset for training the network. For this, it is necessary to register sensor readings for relatively clean air and for artificially generated or known sources of pollution. Training the neural network on the basis of collected data can be used to determine (classify) types of air: with pollution (pollutant) or without. For this, an experiment was set up in the "ReLab" co-working space at the Taras Shevchenko National University of Kyiv. The sensors were placed in a closed box, in which airflow ventilation is provided. The ZPHS01B [4] sensor module was used for inbox measurements, as well as, calibrated sensors PMS7003 [5] and BME280 [6]. Additionally, IPS 7100 [7] and SPS30 [8] were added to enrich the database for ML training. A platform based on HiLink 7688 was used for data collecting, processing, and transmission.Data was measured every two seconds, independently from each sensor. Before each experiment, the room was ventilated to avoid influence on the next series of experiments.

Full Text