A semantic relation preserved word embedding reuse method

Dechuan Zhan,Xinchun Li

doi:10.1360/ssi-2019-0284

Abstract

When deep learning is applied to natural language processing, a word embedding layer can improve task performance significantly due to the semantic information expressed in word vectors. Word embeddings can be optimized end-to-end with the whole framework. However, considering the number of parameters in a word embedding layer, in tasks with a small corpus, the training set can easily be overfitted. To solve this problem, pretrained embeddings obtained from a much larger corpus will be utilized to boost the current model performance. This paper summarizes several methods to reuse pretrained word embeddings. In addition, as corpus topics change, new words will appear for a given task, and their corresponding embeddings cannot be obtained from pretrained vectors. Therefore, to reuse word embeddings, we propose a semantic relation preserved word embedding reuse method. The proposed method first learns word relations from the current corpus. Then, pretrained word embeddings are utilized to help generate embeddings for new observed words. Experimental results verify the effectiveness of the proposed method.

Full Text