Abstract

Anomaly Detection, or Outlier detecting, refers to the problem of finding patterns in data that do not conform to expected behavior, which is of great significance in intrusion detection, fraud detection, and public health anomaly detection. Isolation Forest (iForest) is a widely-used algorithm with excel-lent performance and a low memory requirement, compared to traditional distance-based methods. However, iForest is sensitive to global outliers and is weak in distinguishing local outliers from low-density normal clusters. To address this problem, a hybrid anomaly detection framework based on Isola-ton, Density and Clustering (HAD-IDC) is proposed. It first uses iForest which has low complexity to quickly scan through the dataset to obtain two subsets of the original dataset, which are called the outlier candidate set and the inlier candidate set. And then, Local Outlier Factor (LOF) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) are used on these two subsets, respectively. LOF is used in the inlier candidate set to detect local outliers that were not detected by iForest in the first step, and HDBSCAN is used to sift the low-density normal clusters from local outliers on the outlier candidate set. The experimental results obtained on official benchmark datasets show that HAD-IDC can significantly improve the detection accuracy compared to the existing methods, in term of AUC (area under curve), while reducing the computational time consumption.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call