Abstract

Chinese text categorization, which is a key technology of massive Chinese text data processing, has been applied to information retrieval, document management, text filtering, etc. However, the categorization accuracy has been the major difficulties faced by the application upgrade. To improve the performance of the Chinese text categorization, feature selection, as an important and indispensable means of the Chinese text categorization, also plays an important role in Chinese text categorization. Text feature selection is for all entries in the entire text and selects the feature vector which represents the Text category. Based on the actual text data and CHI feature selection algorithms, we propose a feature selection algorithm optimization. For uneven distribution characteristic data set, the algorithm appropriately improves the weight of words concentrated in a few number of documents. Using Support Vector Machine(SVM) classification algorithm to verify the effect of feature selection, the experimental results show that the new feature selection algorithm have better performance than existing algorithms in the Chinese text categorization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.