RSVN: A RoBERTa Sentence Vector Normalization Scheme for Short Texts to Extract Semantic Information

Lei Gao,Lei Zhang,Lijuan Zhang,Jie Huang

doi:10.3390/app122111278

Abstract

With the explosive growth in short texts on the Web and an increasing number of Web corpora consisting of short texts, short texts are playing an important role in various Web applications. Entity linking is a crucial task in knowledge graphs and a key technology in the field of short texts that affects the accuracy of many downstream tasks in natural language processing. However, compared to long texts, the entity-linking task of Chinese short text is a challenging problem due to the serious colloquialism and insufficient contexts. Moreover, existing methods for entity linking in Chinese short text underutilize semantic information and ignore the interaction between label information and the original short text. In this paper, we propose a RoBERTa sentence vector normalization scheme for short texts to fully extract the semantic information. Firstly, the proposed model utilizes RoBERTa to fully capture contextual semantic information. Secondly, the anisotropy of RoBERTa’s output sentence vectors is revised by utilizing the standard Gaussian of flow model, which enables the sentence vectors to more precisely characterize the semantics. In addition, the interaction between label embedding and text embedding is employed to improve the NIL entity classification. Experimental results demonstrate that the proposed model outperforms existing research results and mainstream deep learning methods for entity linking in two Chinese short text datasets.

Full Text