Accurate meteorological observation data is of great importance to human production activities. Meteorological observation systems have been advancing toward automation, intelligence, and informatization. Yet, instrumental malfunctions and unstable sensor node resources could cause significant deviations of data from the actual characteristics it should reflect. To achieve greater data accuracy, early detections of data anomalies, continuous collections and timely transmissions of data are essential. While obvious anomalies can be readily identified, the detection of systematic and gradually emerging anomalies requires further analyses. This study develops an interpretable deep learning method based on an autoencoder (AE), SHapley Additive exPlanations (SHAP) and Bayesian optimization (BO), in order to facilitate prompt and accurate anomaly detections of meteorological observational data. The proposed method can be unfolded into four parts. Firstly, the AE performs anomaly detections based on multidimensional meteorological datasets by marking the data that shows significant reconstruction errors. Secondly, the model evaluates the importance of each meteorological element of the flagged data via SHapley Additive exPlanation (SHAP). Thirdly, a K-sigma based threshold automatic delineation method is employed to obtain reasonable anomaly thresholds that are subject to the data characteristics of different observation sites. Finally, the BO algorithm is adopted to fine-tune difficult hyperparameters, enhancing the model's structure and thus the accuracy of anomaly detection. The practical implication of the proposed model is to inform agricultural production, climate observation, and disaster prevention.
Read full abstract