Abstract

Recently, the self-attention mechanism (Transformer) has shown its advantages in various natural language processing (NLP) tasks. Since positional information is crucial to NLP tasks, the positional encoding has become a critical factor in improving the performance of the Transformer. In this paper, we present a simple but effective complex-valued relative positional encoding (CRPE) method. Specifically, we map the query and key vectors to the complex domain based on their positions. Hence, the attention weights will directly contain the relative positional information by the dot product between the complex-valued query and key vectors. To demonstrate the effectiveness of our method, we use four typical NLP tasks: named entity recognition, text classification, machine translation, and language modeling. The datasets of these tasks comprise texts of varying lengths. In the experiments, our method outperforms the baseline positional encodings across all datasets. The results show that our method is more effective for long and short texts while containing fewer parameters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call