Evolving masked low-rank transformer for long text understanding

Chenjing Liu,Xiangru Chen,Jie Lin,Peng Hu,Junfeng Wang,Xue Geng

doi:10.1016/j.asoc.2023.111207

Abstract

Long sequence text processing is time-consuming owing to the ultra-large-scale self-attention computing. Recent advances demonstrate the attention in transformer can be accelerated by redundancy removal, and there are various sparse variants for attention in large sequences are proposed, which leads to state-of-the-art performance on language and vision task. Low-rank method achieve outstanding success in the field of efficient transformer. The dynamic token sparsification is efficiently time-saving and cost-saving, which can be easily extended to prune redundant spans and to yield semantic features. Evolutionary algorithm is attractive for selecting hyperparameter which is of significant importance in effectiveness. Motivated by these works, we propose an efficient transformers model, termed EMLT, to alleviate time and cost without sacrificing the accuracy. EMLT effectively combines strengths of Low-rank transformers, dynamic token sparsification and evolutionary algorithm to ulteriorly cut redundant token and meanwhile maintains the original precision, which can achieve a linear memory and time complexity. We compress transformer in three stages. Firstly, sliding window is validated as local attention to capture fine-grained dependency semantics. After that, low-rank approximation of attention matrix is applied as global attention to store long-range dependency semantics, and aggregated with local attention. On this basis, we consistently prune redundant token in accordance with importance score to further sparse the attention operation. Finally, Evolutionary algorithm is utilized to optimize the hyper-parameters of every layer. The results of comprehensive experiments and analysis show that our method can rival others on accuracy, and outperforms others on efficiency by a large margin in terms of the computational complexity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evolving masked low-rank transformer for long text understanding

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing

Lead the way for us

Journal: Applied Soft Computing	Publication Date: Dec 29, 2023
Citations: 1

Similar Papers

H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences

-

01 Aug 2021
01 Aug 2021

Scalable clustering and applications
Shahid K I ... Santanu Chaudhury
-
Shahid K I, et. al.Shahid K I ... Santanu Chaudhury
18 Dec 2016
18 Dec 2016

EstraNet: An Efficient Shift-Invariant Transformer Network for Side-Channel Analysis
Suvadeep Hajra ... Debdeep Mukhopadhyay
IACR Transactions on Cryptographic Hardware and Embedded Systems | VOL. 2024
Suvadeep Hajra, et. al.Suvadeep Hajra ... Debdeep Mukhopadhyay
04 Dec 2023
IACR Transactions on Cryptographic Hardware and Embedded Systems | VOL. 2024

Constrained spectral clustering via multi–layer graph embeddings on a grassmann manifold
Aleksandar Trokicić ... Branimir Todorović
International Journal of Applied Mathematics and Computer Science | VOL. 29
Aleksandar Trokicić, et. al.Aleksandar Trokicić ... Branimir Todorović
01 Mar 2019
International Journal of Applied Mathematics and Computer Science | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evolving masked low-rank transformer for long text understanding

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing