Incremental anomaly detection using two-layer cluster-based structure

Elnaz Bigdeli,Mahdi Mohammadi,Bijan Raahemi,Stan Matwin

doi:10.1016/j.ins.2017.11.023

Abstract

Anomaly detection algorithms face several challenges, including processing speed, adapting to changes in dynamic environments, and dealing with noise in data. In this paper, a two-layer cluster-based anomaly detection structure is presented which is fast, noise-resilient and incremental. The proposed structure comprises three main steps. In the first step, the data are clustered. The second step is to represent each cluster in a way that enables the model to classify new instances. The Summarization based on Gaussian Mixture Model (SGMM) proposed in this paper represents each cluster as a GMM. In the third step, a two-layer structure efficiently updates clusters using GMM representation, while detecting and ignoring redundant instances. A new approach, called Collective Probabilistic Labeling (CPL) is presented to update clusters incrementally. This approach makes the updating phase noise-resistant and fast. An important step in the updating is the merging of new clusters with existing ones. To this end, a new distance measure is proposed, which is a modified Kullback–Leibler distance between two GMMs.In most real-time anomaly detection applications, incoming instances are often similar to previous ones. In these cases, there is no need to update clusters based on duplicates, since they have already been modeled in the cluster distribution. The two-layer structure is responsible for identifying redundant instances. Ignoring redundant instances, which are typically in the majority, makes the detection phase faster.The proposed method is found to lower the false alarm rate, which is one of the basic problems for the one-class SVM. Experiments show the false alarm rate is decreased from 5% to 15% among different datasets, while the detection rate is increased from 5% to 10% in different datasets with two-layer structure. The memory usage for the two-layer structure is 20 to 50 times less than that of one-class SVM. The one-class SVM uses support vectors in labeling new instances, while the labeling of the two-layer structure depends on the number of GMMs. The experiments show that the two-layer structure is 20 to 50 times faster than the one-class SVM in labeling new instances. Moreover, the updating time of the two-layer structure is two to three times less than for a one-layer structure. This reduction is the result of using two-layer structure and ignoring redundant instances.

Full Text