Abstract
Anomaly detection in real-time data is accepted as a vital area of research. Clustering techniques have effectively been applied for the detection of anomalies several times. As the datasets are real time, the time of data generation is important. Most of the existing clustering-based methods either follow a partitioning or a hierarchical approach without addressing time attributes of the dataset distinctly. In this article, a mixed clustering approach is introduced for this purpose, which also takes time attributes into consideration. It is a two-phase method that first follows a partitioning approach, then an agglomerative hierarchical approach. The dataset can have mixed attributes. In phase one, a unified metric is used that is defined based on mixed attributes. The same metric is also used for merging similar clusters in phase two. Tracking of the time stamp associated with each data instance is conducted simultaneously, producing clusters with different lifetimes in phase one. Then, in phase two, the similar clusters are merged along with their lifetimes. While merging the similar clusters, the lifetimes of the corresponding clusters with overlapping cores are merged using superimposition operation, producing a fuzzy time interval. This way, each cluster will have an associated fuzzy lifetime. The data instances either belonging to sparse clusters, not belonging to any of the clusters or falling in the fuzzy lifetimes with low membership values can be treated as anomalies. The efficacy of the algorithms can be established using both complexity analysis as well as experimental studies. The experimental results with a real world dataset and a synthetic dataset show that the proposed algorithm can detect the anomalies with 90% and 98% accuracy, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.