Extremely Short Chinese Text Classification Method Based on Bidirectional Semantic Extension

Yongzeng Yue,Xuegang Hu,Yuhong Zhang,Peipei Li

doi:10.1088/1742-6596/1437/1/012026

Yongzeng Yue, Xuegang Hu + Show 2 more

Open Access

https://doi.org/10.1088/1742-6596/1437/1/012026

Copy DOI

Abstract

Short text classification methods have achieved significant progress and wide application on text data such as Twitter and Weibo. However, the extremely short chinese texts like tax invoice data are different with traditional short texts in lackness of contextual semantic information, feature sparseness and extremely short length. The existing short text classification methods are difficult to achieve a satisfactory performance in these texts. To address these problems, this paper proposes a text classification method based on bidirectional semantic extension for extremely short texts like Chinese tax invoice data. More specifically, firstly, the Chinese knowledge graph is introduced for extending bidirectional semantic of texts and label data to expand the extremely short texts and ease the problem of feature sparseness; secondly, the hash vectorization is used to avoid the semantic problem caused by the lackness of contextual information. Experimental results conducted the real tax invoice dataset demonstrate the effectiveness of our proposed method.

Full Text