Abstract

This paper presents a Chinese short text classification method which considering extended semantic constraints and statistical constraints. This method uses “HowNet” tools to build the attribute set of concept. when coming to the part of feature expansion, we judge the collocation between the attribute words of original text and the characteristics before and after expansion as the semantic constraints, and calculate the ratio between the mutual information of the original contents and the features before expansion versus the mutual information of the original contents and the features after expansion as statistical constraints, so as to judge whether feature expansion is effective with this two constraints , then rationally use various semantic relation word-pairs in short text classification. Experiments show that this method can use semantic relations in Chinese short text classification effectively, and improve the classification performance.

Highlights

  • The short-text classification is an automatic classification for short texts (The text length is usually less than 160 characters)

  • If we want to expand features for short-text effectively, the following two issues must be resolved:1 、 How to determine whether noise is introduced when expanded,2、How to apply different semantic relation word-pairs to classification of short text to improve the classification performance

  • This paper presents a classification method for short Chinese text considering effective feature expansion SCTCEFE(Short Chinese Text Classification Considering Effective Feature Expansion)

Read more

Summary

INTRODUCTION

The short-text classification is an automatic classification for short texts (The text length is usually less than 160 characters) It is required for filtering information such as mobile phone short message, web comments, network chat, etc. Experimental results in this paper show: when doing feature extension for test text by using the word-pairs extension set directly[5,6], the classification performance of short text isn‟t improved, but is slightly reduced. If we want to expand features for short-text effectively, the following two issues must be resolved:1 、 How to determine whether noise is introduced when expanded,2、How to apply different semantic relation word-pairs to classification of short text to improve the classification performance.

A CLASSIFICATION METHOD OF SHORT CHINESE
The standard of semantic constraint and statistical constraint
The Algorithm Description of Feature Expansion based on semantic relations
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.