Abstract

Machine learning (ML) techniques promise to revolutionize environmental research and management, but collecting the necessary volumes of high-quality data remains challenging. Environmental sensors are often deployed under harsh conditions, requiring labor-intensive quality assurance and control (QAQC) processes. The need for manual QAQC is a major impediment to the scalability of these sensor networks. Existing techniques for automated QAQC make strong assumptions about noise profiles in the data they filter that do not necessarily hold for broadly deployed environmental sensors, however. Toward the goal of increasing the volume of high-quality environmental data, we introduce an ML-assisted QAQC methodology that is robust to low signal-to-noise ratio data. Our approach embeds sensor measurements into a dynamical feature space and trains a binary classification algorithm (Support Vector Machine) to detect deviation from expected process dynamics, indicating whether a sensor has become compromised and requires maintenance. This strategy enables the automated detection of a wide variety of nonphysical signals. We apply the methodology to three novel data sets produced by 136 low-cost environmental sensors (stream level, drinking water pH, and drinking water electroconductivity), deployed by our group across 250,000 km2 in Michigan, USA. The proposed methodology achieved accuracy scores of up to 0.97 and consistently outperformed state-of-the-art anomaly detection techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call