Abstract

Modern technologies let researchers and practitioners explore large datasets. Anomaly detection methods applied to fix or delete unwanted records are of great importance here. One of the fastest and the most effective algorithms of anomaly detection is Isolation Forest. This solution is based on building isolation binary trees by randomly splitting the dataset elements. In this manuscript, we propose an innovative approach modifying this technique. In particular, we replace random divisions in the base mechanism with divisions based on Minimal Spanning Tree clustering. Additionally, we improve the evaluation process by introducing a two-component score function. The first component is related to the level of the test element in the isolation tree. The second term is calculated as the distance between specific points in the last split node. Namely, between the value of the evaluated attribute and the partition center stored in the node. In a series of comprehensive experiments, the proposed approach was compared with other Isolation Forest-based algorithms as well as state-of-the-art competing solutions. Our enhancement has proved its advantage in classification quality. In addition, the implementation operation times of selected solutions were measured. The results clearly demonstrate high effectiveness of the proposed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call