Abstract

Anomaly detection is an important field in data science that has been widely researched and applied, generating many methods. Among these methods, the isolation forest algorithm is outstanding because of its efficiency and effectiveness, especially in regard to large-scale data. Unfortunately, this algorithm has some drawbacks, such as being unable to effectively handle local outliers, possibly leading to normal data masking the detection of outliers due to space partitioning along the coordinate axes, and low utilization of training data. To address these issues, in this paper we propose an improved isolation forest algorithm based on multilayer subspace dividing, named layered isolation forest, which adapts to the different distributions of the dataset by dividing the sample space into subspaces and evaluating the anomaly degree of data in different spatial ranges. This algorithm obtains a more accurate and reasonable anomaly score, avoids the problems of the original algorithm, and improves the performance metrics. According to the experimental results, the proposed method maintains the efficiency of the original algorithm and exhibits the best comprehensive performance compared with similar algorithms on artificial synthetic datasets and real-world datasets from multiple domains.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call