Abstract

In this paper, we propose a method of genetic algorithm (GA) for text clustering based on singular value decomposition technique. The main difficulty in the application of GA to text clustering is its long string representation in high dimensional space. Because the most straightforward and popular approach represents texts with vector space model (VSM), that is, each unique term in the vocabulary represents one dimension. Singular value decomposition (SVD) is a successful technique arising from numerical linear algebra that is used in latent semantic indexing (LSI). Employing the SVD-based document representation, LSI can overcome the problems by using statistically derived conceptual indices instead of individual words and provide a dimension reduced space. Genetic algorithm belongs to search techniques which could automatically exploit the optimal solution for objective or fitness function of an optimization problem. GA can be used in conjunction with the reduced latent semantic structure and improve clustering efficiency and accuracy. Our algorithm is performed on Reuter documents collection. The results show that the performance of SVD-based GA is significantly superior to that of conventional GA in vector space model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call