Sparsity in transformers: A systematic literature review

Mirko Farina,Usman Ahmad,Ahmad Taha,Hussein Younes,Yusuf Mesbah,Xiao Yu,Witold Pedrycz

doi:10.1016/j.neucom.2024.127468

Abstract

Transformers have become the state-of-the-art architectures for various tasks in Natural Language Processing (NLP) and Computer Vision (CV); however, their space and computational complexity present significant challenges for real-world applications. A promising approach to address these issues is the introduction of sparsity, which involves the deliberate removal of certain parameters or activations from the neural network. In this systematic literature review, we aimed to provide a comprehensive overview of current research on sparsity in transformers. We analyzed the different sparsity techniques applied to transformers, their impact on model performance, and their efficiency in terms of time and space complexity. Moreover, we identified the major gaps and challenges in the existing literature. Our study also highlighted the importance of investigating sparsity in transformers for computational efficiency, reduced resource requirements, scalability, environmental impact, and hardware-algorithm co-design. By synthesizing the current state of research on sparsity in transformer-based models, we also provided valuable insights into their efficiency, impact on model performance, and potential trade-offs, contributing to advancing the field further.

Full Text