Performance Evaluation of New Text Mining Method Based on GA and K-Means Clustering Algorithm

Neha Garg,R K Gupta

doi:10.1007/978-981-10-4603-2_3

Abstract

Rapid breakthrough in technology and reduced storage cost permit the individuals and organizations to generate and gather an enormous amount of text data. Extracting user interested documents from this gigantic amount of text data is a tedious job. This necessitates the development of text mining method for discovering interesting information or knowledge from the massive data. Document clustering is an effective text mining method which classifies the similar set of documents into the most relevant groups. K-means is the most classic clustering algorithm. However, results obtained by K-means highly depend on initial cluster centers and might be trapped in local optima. The paper presents a K-means document clustering algorithm with optimized initial cluster centers based on genetic algorithm. Experimental studies conducted over two different text datasets confirm that clustering results are more accurate by the application of the proposed method compared to K-means clustering.

Full Text