Abstract

Recent progress in applying neural networks to image classification has motivated the exploration of their applications to text classification tasks. Unlike the majority of these researches devoting to English corpus, in this paper, we focus on Chinese text, which is more intricate in semantic representations. As the basic unit of Chinese words, character plays a vital role in Chinese linguistic. However, most existing Chinese text classification methods typically regard word features as the basic unit of text representation but ignore the beneficial performance of character features. Besides, existing approaches compress the entire word features into a semantic representation, without considering attention mechanism which allows for capturing salient features. To tackle these issues, we propose the word-character attention model (WCAM) for Chinese text classification. This WCAM approach integrates two levels of attention models: word-level attention model captures salient words which have closer semantic relationship to the text meaning, and character-level attention model selects discriminative characters of text. Both are jointly employed to learn representation of texts. Meanwhile, the word-character constraint model and character alignment are introduced in our proposed approach to ensure the highly representative of selected characters as well as enhance their discrimination. Both are jointly employed to exploit the subtle and local differences for distinguishing the text classes. Extensive experiments on two benchmark datasets demonstrate that our WCAM approach achieves comparable or even better performance than the state-of-the-art methods for Chinese text classification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.