Abstract
Outliers are often present in large datasets of water quality monitoring time series data. A method of combining the sliding window technique with Dixon detection criterion for the automatic detection of outliers in time series data is limited by the empirical determination of sliding window sizes. The scientific determination of the optimal sliding window size is very meaningful research work. This paper presents a new Monte Carlo Search Method (MCSM) based on random sampling to optimize the size of the sliding window, which fully takes advantage of computers and statistics. The MCSM was applied in a case study to automatic monitoring data of water quality factors in order to test its validity and usefulness. The results of comparing the accuracy and efficiency of the MCSM show that the new method in this paper is scientific and effective. The experimental results show that, at different sample sizes, the average accuracy is between 58.70% and 75.75%, and the average computation time increase is between 17.09% and 45.53%. In the era of big data in environmental monitoring, the proposed new methods can meet the required accuracy of outlier detection and improve the efficiency of calculation.
Highlights
The rapid development of the Internet of Things has promoted the application of smart sensors in the field of the environment, contributing to big data and the multi-dimension characteristics of environmental monitoring [1,2]
In order to scientifically compare the correctness of the new Monte Carlo Search Method (MCSM), Full Time Series Sliding Search Method (FTSSSM) experiments were carried out at the same time
When the sampling scale was 0.8n, the optimal window accuracy of different water quality factors was between 67.5% and 85%
Summary
The rapid development of the Internet of Things has promoted the application of smart sensors in the field of the environment, contributing to big data and the multi-dimension characteristics of environmental monitoring [1,2]. Outlier processing is critical in environmental data analysis owing to its significant effect on future analysis and modeling [3,4]. The environment automatically requires monitoring values such as typical time series data, which have a large-scale collection time and include complex causes of outliers. There are many ways to detect outliers in time series, such as outlier detection based on prior rules [7], statistical distribution characteristics [8], the Kalman Filter Model (KLM) and Bayesian model [9], the Generalised Linear Model (GLM) -based algorithm [10], intelligence algorithms [3], etc.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.