Abstract Environmental sensor networks play a crucial role in monitoring key parameters essential for understanding Earth’s systems. To ensure the reliability and accuracy of collected data, effective quality control (QC) measures are essential. Conventional QC methods struggle to handle the complexity of environmental data. Conversely, advanced techniques such as neural networks, are typically not designed to process data from sensor networks with irregular spatial distribution. In this study, we focus on anomaly detection in environmental sensor networks using graph neural networks, which can represent sensor network structures as graphs. We investigate its performance on two datasets with distinct dynamics and resolution: commercial microwave link (CML) signal levels used for rainfall estimation and SoilNet soil moisture measurements. To evaluate the benefits of incorporating neighboring sensor information for anomaly detection, we compare two models: graph convolution network (GCN) and a graph-less baseline: long short-term memory (LSTM). Our robust evaluation through 5-fold cross-validation demonstrates the superiority of the GCN models. For CML, the mean area under receiver operating characteristic curve for the GCN was 0.941 compared to 0.885 for the baseline-LSTM, and for SoilNet, it was 0.858 for GCN and 0.816 for the baseline-LSTM. Visual inspection of CML time series revealed that the GCN proficiently classified anomalies and remained resilient against rain-induced events often misidentified by the baseline-LSTM. However, for SoilNet, the advantage of GCN was less pronounced, likely due to inconsistent and less precise labeling. Through interpretable model analysis, we demonstrate how feature attributions vividly illustrate the significance of neighboring sensor data, particularly in distinguishing between anomalies and expected changes in signal level in the time series.
Read full abstract