An Improved Text Clustering Method based on Hybrid Model

Jinzhu Hu,Chunxiu Xiong,Jiangbo Shu,Jun Zhu,Xing Zhou

doi:10.5815/ijmecs.2009.01.05

Abstract

According to the high-dimensional sparse features on the storage of textual document, and defects existing in the clustering methods or the hybrid methods which have already been studied by now and some other problems. So an improved text clustering method based on hybrid model, that is a text clustering approach (short for TGSOM-FS-FKM) based on tree-structured growing self-organizing maps (TGSOM) and Fuzzy K-Means (FKM) is proposed. The method has optimized the clustering result through three times of clustering. It firstly makes preprocess of texts, and filters the majority of noisy words by using an unsupervised feature selection method. Then it used TGSOM to execute the first clustering to get a rough classification of texts, and to get the initial clustering number and each text's category. And then introduced LSA theory to improve the precision of clustering and reduce the dimension of the feature vector. After that, it used TGSOM to execute the second clustering to get more precise clustering results, and used supervised feature selection method to select feature items. Finally, it used FKM to cluster the result set. In the experiment, it remained the same number of feature items and experimental results indicate that TGSOM-FS-FKM clustering excels to other clustering method such as DSOM-FS-FCM, and the precision is better than DSOM-FCM, DFKCN and FDMFC clustering.

Full Text