Abstract

Data streams have become an integral part of the rapidly evolving modern information landscape in various application domains. Stream clustering, and in particular density-based clustering, has emerged as one of the most commonly used data stream analysis tasks. Several density-based stream clustering methods have been proposed; chief among them is DenStream. Existing DenStream clustering methods usually preserve only the key summary descriptors about each cluster such as the center and radius. Such approach is not suitable for streams that observe discrete entities, since the clustering process does not maintain the entity-level composition of each cluster over time. The primary challenge we explore in this paper is therefore how existing DenStream clustering methods can be enhanced to support entity-based stream mining in geographical space. In view of this consideration, this paper presents GeoDenStream, a spatiotemporal entity-based stream clustering method. Building on DenStream, GeoDenStream is particularly suitable for clustering discrete entities due to its ability to track the relationship between entities and clusters over time and its ability to recover data that has been incorrectly labeled as noise. Memory efficiency in GeoDenStream is achieved by using a combination of data pruning and indexing. The performance of GeoDenStream was evaluated with both synthetic and real-world stream data from a popular social media platform (Twitter). The results of these evaluations show that GeoDenStream is able to efficiently handle memory constraints, overlapping data points, and false noise.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call