Abstract

The Transformer has emerged as the predominant model in Natural Language Processing due to its exceptional performance in various sequence modeling tasks, particularly in handling long-term dependencies. However, the traditional absolute and relative position encoding methods, which do not learn from data, tend to ignore the inherent structure of natural language sequences due to the position embedding layer at the input end of the Transformer model. This paper introduces a novel learnable neural Ordinary Differential Equation Position Encoding (ODEPE) method that can implicitly capture the natural position relationships within a sequence without requiring additional position embeddings. ODEPE can model continuous sequences and leverage differential equations to simulate the evolution of position information along the sequence, enabling position information to flow seamlessly between sequences. Additionally, a highly effective recurrent attention framework is proposed, which hybridizes attention with the ODEPE method to improve model performance. Compared to the Transformer-based sequence modeling network, our framework demonstrates a performance improvement of 24.0 points on the WikiText-103 dataset, while also achieving a performance improvement of 1.06 points on the Enwik8 dataset. This corresponds to an improvement of 4.9% and 0.17%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call