Abstract

Various applications, such as electronic business, satellite remote sensing, intrusion discovery, and network traffic monitoring, generate large unbounded data stream sequences at a rapid pace. The clustering of data streams has attracted considerable interest due to the increasing usage of evolving data streams. In particular, evolving data streams affect clustering because they introduce numerous challenges, such as time and memory limits and one-pass clustering. Furthermore, researchers need to be able to determine arbitrarily shaped clusters present in evolving data streams from applications. Due to these characteristics, conventional density grid-based clustering techniques cannot be used. Moreover, the existing density grid-based clustering algorithms have low cluster quality for clustering evolving data streams. This study conducted a systematic literature review (SLR) and noted numerous research-related issues encountered in solving the aforementioned problems. We summarized numerous grid-based clustering algorithms that have been used and determined their distinctive and limited features. We also observed how these algorithms address the challenges affecting the clustering of evolving data streams and studied their advantages and disadvantages. SLR was based on 104 articles published between 2010 and 2021. Numerous challenges remain for grid-based clustering algorithms, particularly in terms of time-limited and high-dimensional data handling. Last, our findings indicated a variety of active studies on density grid-based clustering.

Highlights

  • In recent years, rapid growth in the fields of computer intelligence and quarrying data streams has occurred, as mining instruments and specialized extraction devices have increased in popularity among users [1,2,3,4,5]

  • We proposed possible challenges and likely directions toward a future solution to deal with evolving data stream clustering that could help guide future research toward new solution mechanisms and undiscussed areas

  • We provide overviews on density grid-based clustering paradigms and data streams to ensure that this survey is selfcontained and enhances reader convenience

Read more

Summary

Introduction

Rapid growth in the fields of computer intelligence and quarrying data streams has occurred, as mining instruments and specialized extraction devices have increased in popularity among users [1,2,3,4,5]. In 2007, the amount of information created and collected globally exceeded the available storage capacity for the first time [6,7,8,9]. An increasing number of data streams demanding excessive storage capacity, real-time monitoring, and high-frequency sampling have been produced, thereby deviating from traditional static data collection [10,11,12,13]. The mining and real-time analysis of data streams have triggered renewed research focus on identifying and developing improved processes [12, 14,15,16]. Typical features of data streams are as follows [17,18,19]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call