Abstract

Anomaly detection is one of the basic problems in the field of data mining. It has been concerned and widely used by industry and academia. With the continuous increase of hydrological data, the current anomaly detection algorithm is too low in time efficiency. Besides, there are too many anomaly points excavated. In the face of so many anomaly points, the analysis decision-makers have no way to start. For this problem, this paper uses the isolation forest algorithm for the hydrological data of the pattern representation, and a hydrological time series anomaly pattern detection algorithm based on isolation forest is proposed. At the same time, it is difficult to determine the partition threshold in isolation forest and can not output top-k. K-means clustering respectively algorithm and nearest neighbor algorithm are used to improve the isolation forest algorithm, which can effectively overcome the subjectivity of artificially setting threshold and improve the stability of results expression. It is applied to the measured data of the Chuhe River Basin, and compared with other improved algorithms in accuracy and time complexity. The effectiveness of the improved isolation forest algorithm is verified by experiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call