Abstract
The Transformer has emerged as the predominant model in Natural Language Processing due to its exceptional performance in various sequence modeling tasks, particularly in handling long-term dependencies. However, the traditional absolute and relative position encoding methods, which do not learn from data, tend to ignore the inherent structure of natural language sequences due to the position embedding layer at the input end of the Transformer model. This paper introduces a novel learnable neural Ordinary Differential Equation Position Encoding (ODEPE) method that can implicitly capture the natural position relationships within a sequence without requiring additional position embeddings. ODEPE can model continuous sequences and leverage differential equations to simulate the evolution of position information along the sequence, enabling position information to flow seamlessly between sequences. Additionally, a highly effective recurrent attention framework is proposed, which hybridizes attention with the ODEPE method to improve model performance. Compared to the Transformer-based sequence modeling network, our framework demonstrates a performance improvement of 24.0 points on the WikiText-103 dataset, while also achieving a performance improvement of 1.06 points on the Enwik8 dataset. This corresponds to an improvement of 4.9% and 0.17%, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.