Anomaly detection in real-time data is accepted as a vital area of research. Clustering techniques have effectively been applied for the detection of anomalies several times. As the datasets are real time, the time of data generation is important. Most of the existing clustering-based methods either follow a partitioning or a hierarchical approach without addressing time attributes of the dataset distinctly. In this article, a mixed clustering approach is introduced for this purpose, which also takes time attributes into consideration. It is a two-phase method that first follows a partitioning approach, then an agglomerative hierarchical approach. The dataset can have mixed attributes. In phase one, a unified metric is used that is defined based on mixed attributes. The same metric is also used for merging similar clusters in phase two. Tracking of the time stamp associated with each data instance is conducted simultaneously, producing clusters with different lifetimes in phase one. Then, in phase two, the similar clusters are merged along with their lifetimes. While merging the similar clusters, the lifetimes of the corresponding clusters with overlapping cores are merged using superimposition operation, producing a fuzzy time interval. This way, each cluster will have an associated fuzzy lifetime. The data instances either belonging to sparse clusters, not belonging to any of the clusters or falling in the fuzzy lifetimes with low membership values can be treated as anomalies. The efficacy of the algorithms can be established using both complexity analysis as well as experimental studies. The experimental results with a real world dataset and a synthetic dataset show that the proposed algorithm can detect the anomalies with 90% and 98% accuracy, respectively.
Read full abstract