Phrase embedding learning from internal and external information based on autoencoder

Rongsheng Li,Qinyong Yu,Shaobin Huang,Linshan Shen,Chi Wei,Xuewei Sun

doi:10.1016/j.ipm.2020.102422

Abstract

Phrase embedding can improve the performance of multiple NLP tasks. Most of the previous phrase-embedding methods that only use the external or internal semantic information of phrases to learn phrase embedding are challenging to solve the problem of data sparseness and have poor semantic presentation ability. To solve the above issues, in this paper, we propose an autoencoder-based method to combine pre-trained phrase embeddings and phrase component word embeddings into new phrase embeddings through complex non-linear transformations. This method uses both internal and external semantic information of phrases to generate new phrases with better semantic expression capabilities. This method can also generate well-represented phrase embeddings when only pre-trained component word embeddings are used as input to solve the problem of data sparseness effectively. We have designed two models for this method. The first one uses an FCNN(Fully Connected Neural Network) as the encoder and decoder, which we call AE-F. The second one uses the attention mechanism shared by the parameters of encoder and decoder to proportionally allocate the outputs of an LSTM and an FCNN, which we call it AE-ALF.We evaluated them in terms of phrase similarity and phrase classification and used two English datasets and two Chinese datasets. Experimental results show that AE-F and AE-ALF methods using pre-trained phrase embeddings and component word embeddings exceed 17 baseline methods, and AE-F and AE-ALF perform similarly. With only pre-trained component word embeddings, AE-F and AE-ALF also exceed most baseline methods, and AE-ALF performs better than AE-F.

Full Text