Abstract

• A data-driven ensemble time-series data estimation model is developed. • Statistical variability and the exogenous factor are considered in the estimation. • The range of normal data is calculated based on the ensemble estimation result. • Contextual and collective outliers in time-series data were effectively identified. In this study, a method to estimate the normal range of groundwater level time-series data was developed to identify outliers in terms of the global, contextual, and collective sense. To evaluate the normal range of groundwater level time-series data, the statistical characteristics of the data and the patterns of the precipitation time-series data were incorporated into the LSTM (Long Short-Term Memory)-based ensemble regressor (i.e., the LER model). Based on the LER model, multiple possible trends of the groundwater level were generated, and the general rules of outlier identification methods (i.e., σ and Tukey’s fences (TF) rules) were applied to the LER ensemble estimation result to finally define the range of the normal data. For outlier identification performance validation, the actual groundwater level acquired from three groundwater monitoring stations in South Korea (i.e., the Pohang–Gibuk (PG), Namwon–Dotong (ND), and Jeju–Sangyae (JS) monitoring wells) and the corresponding precipitation data acquired from the nearest weather stations were applied to the study. As the reference method for comparative performance validation, simple applications of the σ and TF rules were used. For the monitoring data, the developed LER-based outlier identification method evaluates the range of the data that might be explained by the modelled influences of the interest (i.e., normal data range). The developed method showed an outlier identification performance of >70% in general while the performance of the σ and TF rules was mostly <50%. In particular, as the method effectively estimated the seasonal trend and the variability of the groundwater level with consideration of the precipitation patterns and statistics on the groundwater level variation, it is superior for identifying the contextual or collective outliers compared to the simple σ and TF rules. Through in-depth analysis, it can be concluded that the developed LER-based outlier identification method is effective for discriminating the abnormal data by considering the intrinsic statistical characteristics of the original data trend and the exogenous factors. In the aspect of the practical applicability, as the result can be automatically acquired based on real-time monitoring data, the developed method is expected to apply for more efficient maintenance of the monitoring devices by embedding the model as the management software into the monitoring network system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call