Calibration of integrated low-cost environmental sensors based on machine learning with multiple scenes

Fang Nan,Chao Zeng,Huanfeng Shen

doi:10.5194/egusphere-egu24-7377

Fang Nan, Chao Zeng + Show 1 more

https://doi.org/10.5194/egusphere-egu24-7377

Copy DOI

Export

Save

Cite

Publication Date: Nov 27, 2024

Abstract
Full-Text
Similar Papers

Abstract

Listen

With increasing attention to urban temperature and outdoor thermal comfort, monitoring urban microenvironments at a lower cost is an effective method to supplement the spatiotemporal deficiencies of traditional monitoring networks. But widespread use of low-cost sensors has been hampered by uncertainty about their data quality. The calibration of low-cost sensors is key to promoting their large-scale application and increasing people's confidence in related research. The purpose of this study is to calibrate low-cost integrated environmental sensors and effectively improve their hourly data quality based on an IoT case study in Wuhan, China.Based on the standards of 24 traditional weather stations in different locations of the meteorological regulatory authorities, this study applied a total of eight machine learning (ML) algorithms to calibrate low-cost sensors and compared their performance. The eight ML algorithms are: (a) Multiple Linear Regression (MLR); (b) Random Forest (RF); (c) K-Nearest Neighbors (KNN); (d) Gradient Boosting Regression Tree (GBRT); (e) Decision Tree (DT); (f) AdaBoost; (g) Bagging; (h) Extremely randomized Trees (Extra-Trees). Hourly raw data collected by 34 low-cost sensors deployed near traditional weather stations were calibrated, and the model was tested using ten-fold cross-validation. The two farthest locations are 121km apart in a straight line, and the maximum data collected from a single sensor is 12,406 hours. In addition, the model migration effects in different field scenarios were also considered, including six typical land surface types, namely built area, scrub, water, artificial surfaces, woodland, and cultivated land.The results show that the random forest model shows better performance than other models on multiple low-cost sensors at different locations. By applying our method, it shows an average improvement with its R-squared value from 0.682 to 0.980, Root Mean Square Error (RMSE) from 5.989 to 1.355, and Mean Absolute Error (MAE) from 4.250 to 0.932. The random forest model has a better migration effect in similar scenarios. Using a model with a surface type that is more similar to the sensor to be calibrated, the average R-squared obtained by calibrating 34 sensors is 0.946, and the average MAE is 1.584. At the same time, the distance between the sensor to be calibrated and the best-performing migration model was also considered, with the farthest straight-line distance being 94km and the nearest being 7km.This study introduces a calibration method for low-cost meteorological integrated sensors for long-term complex field environment monitoring. Moreover, we compared the migration effect of the random forest model in different typical scenes in the field. Similar surface types are more beneficial to model migration. Even in locations far apart, our model still has stable performance. The results show that this method can significantly improve data quality and increase user confidence in low-cost environmental sensor applications.

Full Text