Improving Self-Attention Networks With Sequential Relations

Zaixiang Zheng,Shujian Huang,Jiajun Chen,Xin-Yu Dai,Rongxiang Weng

doi:10.1109/taslp.2020.2996807

Abstract

Recently, self-attention networks show strong advantages of sentence modeling in many NLP tasks. However, self-attention mechanism computes the interactions of every pair of words independently regardless of their positions, which makes it not able to capture the sequential relations between words in different positions in a sentence. In this paper, we improve the self-attention networks by better integrating sequential relations, which is essential for modeling natural languages. Specifically, we 1) propose a position-based attention to model the interaction between two words regarding positions; 2) perform separated attention for the context before and after the current position, respectively; and 3) merge the above two parts with a position-aware gated fusion mechanism. Experiments in natural language inference, machine translation and sentiment analysis tasks show that our sequential relation modeling helps self-attention networks outperform existing approaches. We also provide extensive analyses to shed light on what the models have learned about the sequential relations.

Full Text