7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access
7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access
https://doi.org/10.1016/j.asoc.2023.111207
Copy DOIJournal: Applied Soft Computing | Publication Date: Dec 29, 2023 |
Citations: 1 |
Long sequence text processing is time-consuming owing to the ultra-large-scale self-attention computing. Recent advances demonstrate the attention in transformer can be accelerated by redundancy removal, and there are various sparse variants for attention in large sequences are proposed, which leads to state-of-the-art performance on language and vision task. Low-rank method achieve outstanding success in the field of efficient transformer. The dynamic token sparsification is efficiently time-saving and cost-saving, which can be easily extended to prune redundant spans and to yield semantic features. Evolutionary algorithm is attractive for selecting hyperparameter which is of significant importance in effectiveness. Motivated by these works, we propose an efficient transformers model, termed EMLT, to alleviate time and cost without sacrificing the accuracy. EMLT effectively combines strengths of Low-rank transformers, dynamic token sparsification and evolutionary algorithm to ulteriorly cut redundant token and meanwhile maintains the original precision, which can achieve a linear memory and time complexity. We compress transformer in three stages. Firstly, sliding window is validated as local attention to capture fine-grained dependency semantics. After that, low-rank approximation of attention matrix is applied as global attention to store long-range dependency semantics, and aggregated with local attention. On this basis, we consistently prune redundant token in accordance with importance score to further sparse the attention operation. Finally, Evolutionary algorithm is utilized to optimize the hyper-parameters of every layer. The results of comprehensive experiments and analysis show that our method can rival others on accuracy, and outperforms others on efficiency by a large margin in terms of the computational complexity.
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.