CGSPN : cascading gated self-attention and phrase-attention network for sentence modeling

Yanping Fu,Yun Liu

doi:10.1007/s10844-020-00610-z

Abstract

Sentence modeling is a critical issue for the feature generation of some natural language processing (NLP) tasks. Recently, most works generated the sentence representation by sentence modeling based on Convolutional Neural Network (CNN), Long Short-Term Memory network (LSTM) and some attention mechanisms. However, these models have two limitations: (1) they only present sentences for one individual task by fine-tuning network parameters, and (2) sentence modeling only considers the concatenation of words and ignores the function of phrases. In this paper, we propose a Cascading Gated Self-attention and Phrase-attention Network (CGSPN) that generates the sentence embedding by considering contextual words and key phrases in a sentence. Specifically, we first present a word-interaction gating self-attention mechanism to identify some important words and build the relationship between words. Then, we cascade a phrase-attention structure by abstracting the semantic of phrases to generate the sentence representation. Experiments on different NLP tasks show that the proposed CGSPN model achieves the highest accuracy among most sentence encoding methods. It improves the latest best result by 1.76% on the Stanford Sentiment Treebank (SST), and shows the best test accuracy on different sentence classification data sets. In the Natural Language Inference (NLI) task, the performance of CGSPN without phrase-attention is better than CGSPN model itself and it obtains competitive performance against state-of-the-art baselines, which show the different applicability of the proposed model. In other NLP tasks, we also compare our model with popular methods to explore our direction.

Full Text