Abstract

In text classification tasks, using traditional char granularity or word granularity information to encode context information into the same parameter space cannot effectively extract enough text feature information, and the expressive ability of text semantic information is also relatively weak. In order to solve this problem, this paper adopts the embedding method of fusing graphemes and morphemes, and establishes a text classification model that combines a hybrid self-attention mechanism and RCNN. The model first divides the text into char sequences and word sequences and fuses them, and then uses the Transformer encoder to extract preliminary features. The self-attention mechanism in the Transformer can better focus on semantic and location information, and then uses RCNN to extract the deep features which fusion grapheme and morpheme information, the feature vector is calculated and output by the softmax function, and finally the classification of Chinese short texts is realized. The experimental results show that the method proposed in this paper has different degrees of improvement in the accuracy of text classification compared with a single embedding method and a single neural network.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.