Abstract

Short text classification methods have achieved significant progress and wide application on text data such as Twitter and Weibo. However, the extremely short chinese texts like tax invoice data are different with traditional short texts in lackness of contextual semantic information, feature sparseness and extremely short length. The existing short text classification methods are difficult to achieve a satisfactory performance in these texts. To address these problems, this paper proposes a text classification method based on bidirectional semantic extension for extremely short texts like Chinese tax invoice data. More specifically, firstly, the Chinese knowledge graph is introduced for extending bidirectional semantic of texts and label data to expand the extremely short texts and ease the problem of feature sparseness; secondly, the hash vectorization is used to avoid the semantic problem caused by the lackness of contextual information. Experimental results conducted the real tax invoice dataset demonstrate the effectiveness of our proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call