Abstract

With the development of online interactive media platforms, a large amount of short text has appeared on the internet. Determining how to classify these short texts efficiently and accurately is of great significance. Graph neural networks can capture information dependencies in the entire short-text corpus, thereby enhancing feature expression and improving classification accuracy. However, existing works have overlooked the role of entities in these short texts. In this paper, we propose a heterogeneous graph-convolution-network-based short-text classification (SHGCN) method that integrates heterogeneous graph convolutional neural networks of text, entities, and words. Firstly, the model constructs a graph network of the text and extracts entity nodes and word nodes. Secondly, the relationship of the graph nodes in the heterogeneous graphs is determined by the mutual information between the words, the relationship between the documents and words, and the confidence between the words and entities. Then, the feature is represented through a word graph and combined with its BERT embedding, and the word feature is strengthened through BiLstm. Finally, the enhanced word features are combined with the document graph representation features to predict the document categories. To verify the performance of the model, experiments were conducted on the public datasets AGNews, R52, and MR. The classification accuracy of SHGCN reached 88.38%, 93.87%, and 82.87%, respectively, which is superior to that of some existing advanced classification methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call