On clustering massive text and categorical data streams

Charu C Aggarwal,Philip S Yu

doi:10.1007/s10115-009-0241-z

Abstract

In this paper, we will study the data stream clustering problem in the context of text and categorical data domains. While the clustering problem has been studied recently for numeric data streams, the problems of text and categorical data present different challenges because of the large and un-ordered nature of the corresponding attributes. Therefore, we will propose algorithms for text and categorical data stream clustering. We will propose a condensation based approach for stream clustering which summarizes the stream into a number of fine grained cluster droplets. These summarized droplets can be used in conjunction with a variety of user queries to construct the clusters for different input parameters. Thus, this provides an online analytical processing approach to stream clustering. We also study the problem of detecting noisy and outlier records in real time. We will test the approach for a number of real and synthetic data sets, and show the effectiveness of the method over the baseline OSKM algorithm for stream clustering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On clustering massive text and categorical data streams

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems

Lead the way for us

Journal: Knowledge and Information Systems	Publication Date: Aug 6, 2009
Citations: 107

Similar Papers

A Framework for Projected Clustering of High Dimensional Data Streams
Charu C Aggarwal ... Jiawei Han
Proceedings 2004 VLDB Conference | VOL. -
Charu C Aggarwal, et. al.Charu C Aggarwal ... Jiawei Han
01 Jan 2004
Proceedings 2004 VLDB Conference | VOL. -

An Efficient Hybrid-Clustream Algorithm for Stream Mining
Ashish Kumar ... Ajmer Singh
-
Ashish Kumar, et. al.Ashish Kumar ... Ajmer Singh
01 Dec 2017
01 Dec 2017

A statistical approach for clustering in streaming data
Niloofar Mozafari ... Sattar Hashemi
Artificial Intelligence Research | VOL. 3
Niloofar Mozafari, et. al.Niloofar Mozafari ... Sattar Hashemi
09 Jan 2014
Artificial Intelligence Research | VOL. 3

Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes
A Mukhopadhyay ... S Bandyopadhyay
IEEE Transactions on Evolutionary Computation | VOL. 13
A Mukhopadhyay, et. al.A Mukhopadhyay ... S Bandyopadhyay
01 Oct 2009
IEEE Transactions on Evolutionary Computation | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On clustering massive text and categorical data streams

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems