A probabilistic generalization of isolation forest

Mikhail Tokovarov,Paweł Karczmarek

doi:10.1016/j.ins.2021.10.075

Abstract

The problem of finding anomalies and outliers in datasets is one of the most important challenges of modern data analysis. Among the commonly dedicated tools to solve this task one can find Isolation Forest (IF) that is an efficient, conceptually simple, and fast method. In this study, we propose the Probabilistic Generalization of Isolation Forest (PGIF) that is an intuitively appealing and efficient enhancement of the original approach. The proposed generalization is based on nonlinear dependence of segment-cumulated probability from the length of segment. Introduction of the generalization allows to achieve more effective splits that are rather performed between the clusters, i.e. regions where datapoints constitute dense formations and not through them. In a comprehensive series of experiments, we show that the proposed method allows us to detect anomalies hidden between clusters more effectively. Moreover, it is demonstrated that our approach favorably affects the quality of anomaly detection in both artificial and real datasets. In terms of time complexity our method is close to the original one since the generalization is related only to the building of the trees while the scoring procedure (which takes the main time) is kept unchanged.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Sciences	Publication Date: Nov 7, 2021
Citations: 40	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

A probabilistic generalization of isolation forest

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Similar Papers

Robust Fuzzy Clustering Neural Networks
Zhao-Hong Deng
Journal of Software | VOL. 16
Zhao-Hong DengZhao-Hong Deng
01 Jan 2004
Journal of Software | VOL. 16

Reliable and fast automatic artifact rejection of Long-Term EEG recordings based on Isolation Forest.
Runkai Zhang ... Haixian Wang
Medical & biological engineering & computing | VOL. 62
Runkai Zhang, et. al.Runkai Zhang ... Haixian Wang
09 Nov 2023
Medical & biological engineering & computing | VOL. 62

Anomaly Pattern Detection in Streaming Data Based on the Transformation to Multiple Binary-Valued Data Streams
Taegong Kim ... Cheong Hee Park
Journal of Artificial Intelligence and Soft Computing Research | VOL. 12
Taegong Kim, et. al.Taegong Kim ... Cheong Hee Park
08 Oct 2021
Journal of Artificial Intelligence and Soft Computing Research | VOL. 12

Connectedness-based subspace clustering
Namita Jain ... C A Murthy
Knowledge and Information Systems | VOL. 58
Namita Jain, et. al.Namita Jain ... C A Murthy
20 Mar 2018
Knowledge and Information Systems | VOL. 58

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A probabilistic generalization of isolation forest

Abstract

Talk to us

Similar Papers

More From: Information Sciences