Clustering Parallel Data Streams

Yixin Chen

doi:10.5772/6447

Abstract

Massive volumes of data streams can be found in numerous applications such as network intrusion detection, financial transaction flows, telephone call records, sensor streams, and meteorological data. In recent years, there are increasing demands for mining data streams. Unlike the finite, statically stored data sets, stream data are massive, continuous, temporally ordered, dynamically changing, and potentially infinite [5]. For example, Cortes et al. report that AT&T long distance call records consist of 300 million records per day for 100 million customers. For the stream data applications, the volume of data is usually too huge to be stored or to be scanned for more than once. Further, in data streams, the data points can only be sequentially accessed. Random access to data is not allowed. Extensive research has been done for mining data streams, including those on the stream data classification [3, 20], mining frequent patterns [9, 17, 18], and clustering stream data [1, 2, 8, 9, 10, 11, 12, 13, 14, 16, 19]. In this paper, we study the clustering of multiple and parallel data streams. Our study should be differentiated from some previous studies on clustering stream data [19, 1]. Our goal is to group multiple streams with similar behavior and trend together, instead of to cluster the data records within one data stream. There are various applications where it is desirable to cluster the streams themselves rather than the individual data records within them. For example, the price of a stock may rise and fall from time to time. To reduce the financial risk, an investor may prefer to spread his investment over a number of stocks which may exhibit different behaviors. As another application, in meteorological study and disaster prediction, it is useful to cluster meteorological data streams from different geographical regions of similar curvature trends in order to identify regions with similar meteorological behaviors. Yet another example is that a super market may record sales on different merchandizes. There may be some relationship among the sales of different merchandizes and thus the merchant can make use of the correlation to manipulate the prices to maximize the profit. Clustering refers to partition a data set into clusters such that members within the same cluster are similar in a certain sense and members of different clusters are dissimilar. Current clustering techniques can be broadly classified into several categories: partitioning methods (e.g., k-means and k-medoids), hierarchical methods (e.g. BIRCH [22]), densitybased methods (e.g. DBSCAN [15]), and grid-based methods (e.g. CLIQUE [4]). However, these methods are designed only for static data sets and can not be directly applied to data streams. O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Clustering Parallel Data Streams

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2009
Citations: 20	License type: cc-by-sa

Similar Papers

CC_TRS: Continuous Clustering of Trajectory Stream Data Based on Micro Cluster Life
Musaab Riyadh ... Norwati Mustapha
Mathematical Problems in Engineering | VOL. 2017
Musaab Riyadh, et. al.Musaab Riyadh ... Norwati Mustapha
01 Jan 2017
Mathematical Problems in Engineering | VOL. 2017

FHC-NDS: Fuzzy Hierarchical Clustering of Multiple Nominal Data Streams
Jerry W Sangma ... Yogita Yogita
IEEE Transactions on Fuzzy Systems | VOL. 31
Jerry W Sangma, et. al.Jerry W Sangma ... Yogita Yogita
01 Mar 2023
IEEE Transactions on Fuzzy Systems | VOL. 31

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams
Reem Al-Mulla ... Zaher Al Aghbari
International Journal of Data Warehousing and Mining | VOL. 7
Reem Al-Mulla, et. al.Reem Al-Mulla ... Zaher Al Aghbari
01 Oct 2011
International Journal of Data Warehousing and Mining | VOL. 7

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams
Reem Al-Mulla ... Zaher Al Aghbari
-
Reem Al-Mulla, et. al.Reem Al-Mulla ... Zaher Al Aghbari
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clustering Parallel Data Streams

Abstract

Talk to us

Similar Papers