Abstract
In recent years, a significant boost in data availability for persistent data streams has been observed. These data streams are continually evolving, with the clusters frequently forming arbitrary shapes instead of regular shapes in the data space. This characteristic leads to an exponential increase in the processing time of traditional clustering algorithms for data streams. In this study, we propose a new online method, which is a density grid-based method for data stream clustering. The primary objectives of the density grid-based method are to reduce the number of distant function calls and to improve the cluster quality. The method is conducted entirely online and consists of two main phases. The first phase generates the Core Micro-Clusters (CMCs), and the second phase combines the CMCs into macro clusters. The grid-based method was utilized as an outlier buffer in order to handle multi-density data and noises. The method was tested on real and synthetic data streams employing different quality metrics and was compared with the popular method of clustering evolving data streams into arbitrary shapes. The proposed method was demonstrated to be an effective solution for reducing the number of calls to the distance function and improving the cluster quality.
Highlights
A prime application of big data is the Internet of Things (IoT) and its emergence is primarily due to the increase in the number of devices connected to the Internet
In this paper, we propose the ‘Clustering of Evolving Data streams via a density Grid-based Method’ (CEDGM)
The results have demonstrated that the proposed algorithm significantly improves the clustering results compared to Clustering of Evolving Date-streams into Arbitrary Shape (CEDAS) [47] and Cauchy [64]
Summary
A prime application of big data is the Internet of Things (IoT) and its emergence is primarily due to the increase in the number of devices connected to the Internet All these devices are typically outfitted with various sensors that can accumulate large amounts of data in real-time or several times per minute [1]–[6]. In the realm of IoT, data streams are common in many applications, such as for comprehensive web searching, the real-time detection of anomalies within network traffic, social networks, environmental monitoring, cyber-physical systems and sensor networks. In these applications, data evolve significantly over time and continuously arrive [12]–[16].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have