Abstract

Issues of synonymy and strong relational semantic information increase the feature dimension of text vector, which embarrasses the efficiency and precision of text classification. In order to decrease the feature dimension of text vector, a method of text feature extraction based on hybrid parallel genetic clustering algorithm was proposed in this paper. Firstly, K-means algorithm is used to perform thick-granularity clustering for feature words; successively, hybrid parallel genetic algorithm is used to perform thin-granularity clustering for feature words; finally, feature words in each cluster are analyzed and compressed to form feature word set which reflects the feature of text classes and semantic information. The experiments validate our method for text feature extraction is effective.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call