Abstract

With the proliferation of applications generating vast volumes of data streams, numerous clustering methods have emerged to process and extract valuable insights from this data. These methods typically involve online and offline phases. During the online phase, data summaries are stored in micro-clusters, which serve as compact representations of the data. In the subsequent offline phase, static clustering techniques are applied to these micro-clusters to derive the final clusters. However, these methods often employ fixed parameters for creating micro-clusters in the online phase, which can result in the loss of data due to evolving behavioral patterns in the data stream over time. In this study, we propose a novel approach to address this limitation. We introduce a dynamic radius threshold for each micro-cluster in the online phase, allowing for fine adaptation to statistical changes in the data stream distribution. Furthermore, we present a novel method for generating the final clusters in the offline phase. By considering both the shared density and distance between micro-clusters, we overcome the challenge of neglecting density relationships in previous approaches, leading to more accurate clusters. To evaluate the effectiveness of our proposed method, we conduct extensive experiments on synthetic, real-world benchmark, and Twitter datasets. The results demonstrate that our approach outperforms state-of-the-art methods in accurately identifying the correct clusters amidst the dynamic nature of the data streams.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call