Genetic algorithm for text clustering based on latent semantic indexing

Wei Song,Soon Cheol Park

doi:10.1016/j.camwa.2008.10.010

Abstract

In this paper, we develop a genetic algorithm method based on a latent semantic model (GAL) for text clustering. The main difficulty in the application of genetic algorithms (GAs) for document clustering is thousands or even tens of thousands of dimensions in feature space which is typical for textual data. Because the most straightforward and popular approach represents texts with the vector space model (VSM), that is, each unique term in the vocabulary represents one dimension. Latent semantic indexing (LSI) is a successful technology in information retrieval which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space. Meanwhile, LSI takes into account the effects of synonymy and polysemy, which constructs a semantic structure in textual data. GA belongs to search techniques that can efficiently evolve the optimal solution in the reduced space. We propose a variable string length genetic algorithm which has been exploited for automatically evolving the proper number of clusters as well as providing near optimal data set clustering. GA can be used in conjunction with the reduced latent semantic structure and improve clustering efficiency and accuracy. The superiority of GAL approach over conventional GA applied in VSM model is demonstrated by providing good Reuter document clustering results.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computers & mathematics with applications (Oxford, England : 1987)	Publication Date: Nov 11, 2008
Citations: 74	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Genetic algorithm for text clustering based on latent semantic indexing

Abstract

Talk to us

Similar Papers

More From: Computers & mathematics with applications (Oxford, England : 1987)

Lead the way for us

Similar Papers

Analysis of Web Clustering Based on Genetic Algorithm with Latent Semantic Indexing Technology
Wei Song ... Soon Cheol Park
-
Wei Song, et. al.Wei Song ... Soon Cheol Park
01 Jan 2007
01 Jan 2007

Supervised Latent Semantic Indexing for Document Categorization
Jian-Tao Sun ... Yu-Chang Lu
-
Jian-Tao Sun, et. al. Jian-Tao Sun ... Yu-Chang Lu
01 Nov 2004
01 Nov 2004

A Novel Document Clustering Model Based on Latent Semantic Analysis
Wei Song ... Soon Cheol Park
-
Wei Song, et. al.Wei Song ... Soon Cheol Park
01 Oct 2007
01 Oct 2007

A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters
S Bandyopadhyay ... S Saha
IEEE Transactions on Knowledge and Data Engineering | VOL. 20
S Bandyopadhyay, et. al.S Bandyopadhyay ... S Saha
01 Nov 2008
IEEE Transactions on Knowledge and Data Engineering | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Genetic algorithm for text clustering based on latent semantic indexing

Abstract

Talk to us

Similar Papers

More From: Computers & mathematics with applications (Oxford, England : 1987)