Abstract

This paper investigates different deep learning models for various tasks of Natural Language Processing. Recent ongoing research is about the Transformer models and their variations (like the Reformer model). The Recurrent Neural Networks models were efficient up to an only a fixed size of the window. They were unable to capture long-term dependencies for large sequences. To overcome this limitation, the attention mechanism was introduced which is incorporated in the Transformer model. The dot product attention in transformers has a complexity of O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) where n is the sequence length. This computation becomes infeasible for large sequences. Also, the residual layers consume a lot of memory because activations need to be stored for back-propagation. To overcome this limitation of memory efficiency and to make transformers learn over larger sequences, the Reformer models were introduced. Our research includes the evaluation of the performance of these two models on various Natural Language Processing tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.