Abstract

Giving further consideration on linguistic feature, this study proposes an algorithm of Chinese text categorization based on sense group. The algorithm extracts sense group by analyzing syntactic and semantic properties of Chinese texts and builds the category sense group library. SVM is used for the experiment of text categorization. The experimental results show that the precision and recall of the new algorithm based on sense group is better than that of traditional algorithms.

Highlights

  • Text categorization is an automatic processing that assigns a free text to one or more predefined classes or categories based on its content

  • Experimental assessment approach: In the study of text categorization based on sense group, the categorization is assessed from mainly three aspects: precision, recall rate and test value of F1

  • Chinese text categorization algorithm based on sense group considers the structural difference between Chinese and English languages

Read more

Summary

Introduction

Text categorization is an automatic processing that assigns a free text to one or more predefined classes or categories based on its content. The current approaches for Chinese text categorization do not involve syntactic and semantic analysis and often make extraction and matching on the word level, with low categorization accuracy (Dai et al, 2004). Sense-group-based text categorization algorithm trains the corpus according to syntactic and semantic features and builds a category sense-group library.

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.