Abstract
This paper presents a Chinese short text classification method which considering extended semantic constraints and statistical constraints. This method uses “HowNet” tools to build the attribute set of concept. when coming to the part of feature expansion, we judge the collocation between the attribute words of original text and the characteristics before and after expansion as the semantic constraints, and calculate the ratio between the mutual information of the original contents and the features before expansion versus the mutual information of the original contents and the features after expansion as statistical constraints, so as to judge whether feature expansion is effective with this two constraints , then rationally use various semantic relation word-pairs in short text classification. Experiments show that this method can use semantic relations in Chinese short text classification effectively, and improve the classification performance.
Highlights
The short-text classification is an automatic classification for short texts (The text length is usually less than 160 characters)
If we want to expand features for short-text effectively, the following two issues must be resolved:1 、 How to determine whether noise is introduced when expanded,2、How to apply different semantic relation word-pairs to classification of short text to improve the classification performance
This paper presents a classification method for short Chinese text considering effective feature expansion SCTCEFE(Short Chinese Text Classification Considering Effective Feature Expansion)
Summary
The short-text classification is an automatic classification for short texts (The text length is usually less than 160 characters) It is required for filtering information such as mobile phone short message, web comments, network chat, etc. Experimental results in this paper show: when doing feature extension for test text by using the word-pairs extension set directly[5,6], the classification performance of short text isn‟t improved, but is slightly reduced. If we want to expand features for short-text effectively, the following two issues must be resolved:1 、 How to determine whether noise is introduced when expanded,2、How to apply different semantic relation word-pairs to classification of short text to improve the classification performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Research in Artificial Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.