Abstract

Detecting anomalies in data sets has been one of the most studied issues in modern data analysis. Therefore, there is a plethora of applications in a very wide range of fields of science and technology. One of the most frequently used anomaly detection methods is Isolation Forest. In this study, we propose a novel efficient approach based on this technique. In order to improve the classification accuracy of the base method, we make two-fold modifications. First, we propose a change of the technique of building isolation trees to merge nodes by minimal spanning tree algorithm. The second change is based on a modification of the function assessing the anomaly of the analyzed element (data record) to sum of factors correlated with tree height and nearest point distance. In the series of comprehensive computational experiments, the proposed method has proven to produce better results than other compared state-of-the-art methods available in popular data mining programming libraries. It is worth stressing that the final version of the new method in comparison to original Isolation Forest is 2.9% better in terms of AUC measure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call