Bayesian active learning isolation forest (B-ALIF): A weakly supervised strategy for anomaly detection

Gian Antonio Susto,Tommaso Barbariol,Davide Sartor

doi:10.1016/j.engappai.2023.107671

Abstract

Due to the high cost of labelling large datasets, unsupervised Anomaly Detection models have been widely applied in many applications. In many scenarios, Anomaly Detection is typically embedded in Decision Support Systems, where users can monitor complex systems with increased efficiency. However, due to the subjective or domain-specific definition of anomalies, unsupervised models tend to perform poorly as they are based on general definitions of anomaly that might not coincide with the end-user expectations. In many cases human tagging of small amount of data can be feasible: For such scenarios, strategies to properly embed such information in the model and to guide the tagging may be extremely useful. Indeed to solve this problem, scientists are proposing models able to interactively query the end-user, training the detector towards the expected anomaly definition while reducing the human labelling effort; such scenario is especially feasible when Decision Support Systems are in place, since users have simple ways to interact with the Anomaly Detection modules. In this context, a novel Active Learning algorithm is here proposed, called Bayesian Active Learning Isolation Forest (B-ALIF), that, starting from the solution provided by a popular detection model named Isolation Forest, it is able to query the user and update by means of a Bayesian strategy. B-ALIF employs the unsupervised solution provided by the Isolation Forest as its prior, and then improves its detection capabilities as the user interacts with it. Experimental results on real datasets show that, with a limited human effort, the proposed approach is able to learn faster than other state-of-art weakly supervised and supervised approaches.

Full Text