INCORPORATING ATTENTION SCORE TO IMPROVE FORESIGHT PRUNING ON TRANSFORMER MODELS

А В Мельниченко,К А Здор

doi:10.26661/2786-6254-2023-2-03

А В Мельниченко, К А Здор

Open Access

PDF Available

https://doi.org/10.26661/2786-6254-2023-2-03

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

With rapid development of technologies and growing number of application of neural networks, the problem of optimization arises. Among other methods to optimize training and inference time, neural network pruning has attracted attention in recent years. The main goal of pruning is to reduce the computational complexity of neural network models while retaining performance metrics on desired level. Among the various approaches to pruning, Single-shot Network Pruning (SNIP) methods was designed as a straightforward and effective approach to optimize number of parameters before training. However, as neural network architectures have evolved, particularly with the growing popularity of transformers, a need to reevaluate traditional pruning methods arises. This paper aims to revisit SNIP pruning method, evaluate its performance on transformer model, and introduce an enhanced version of SNIP, specifically designed for transformer architectures. The paper outlines the mathematical framework of SNIP algorithm, and proposes a modification, based on specifics of transformers models. Transformer models achieved impressive results because of their attention mechanisms for a multitude of tasks such as language modeling, translation, computer vision tasks and many others. The proposed modification takes into account this unique feature and combines this information with traditional loss gradients. Traditional method calculates importance score for weights of the network using only gradients from loss function, in the case of enhanced algorithm. In the enhanced version, the importance score is a composite metric that incorporates not only the gradient from the loss function but also from the attention activations. To evaluate the efficiency of proposed modifications, a series of experiments were conducted on image classification task, using Linformer variation of transformer architectures. The results of experiments demonstrate the efficiency of incorporating attention scores in pruning. Conducted experiments show that model pruned by modified algorithm outperforms model pruned by original SNIP by 34% in validation accuracy, confirming the validity of the improvements introduced.

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

INCORPORATING ATTENTION SCORE TO IMPROVE FORESIGHT PRUNING ON TRANSFORMER MODELS

Abstract

Published Version (Free)

Talk to us

Similar Papers

More From: Visnyk of Zaporizhzhya National University Physical and Mathematical Sciences

Lead the way for us

Journal: Visnyk of Zaporizhzhya National University Physical and Mathematical Sciences	Publication Date: Dec 19, 2023
License type: cc-by

Similar Papers

Subjective Feedback-based Neural Network Pruning for Speech Enhancement
Fuqiang Ye ... Fei Chen
-
Fuqiang Ye, et. al.Fuqiang Ye ... Fei Chen
01 Nov 2019
01 Nov 2019

Pruning convolutional neural networks for inductive conformal prediction
Xindi Zhao ... Anthony Bellotti
Neurocomputing | VOL. 611
Xindi Zhao, et. al.Xindi Zhao ... Anthony Bellotti
05 Oct 2024
Neurocomputing | VOL. 611

To Filter Prune, or to Layer Prune, That Is the Question
Sara Elkerdawy ... Mostafa Elhoushi
-
Sara Elkerdawy, et. al.Sara Elkerdawy ... Mostafa Elhoushi
01 Jan 2020
01 Jan 2020

Multi‐objective evolutionary optimization for hardware‐aware neural network pruning
Wenjing Hong ... Ke Tang
Fundamental Research | VOL. 4
Wenjing Hong, et. al.Wenjing Hong ... Ke Tang
09 Aug 2022
Fundamental Research | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

INCORPORATING ATTENTION SCORE TO IMPROVE FORESIGHT PRUNING ON TRANSFORMER MODELS

Abstract

Published Version (Free)

Talk to us

Similar Papers

More From: Visnyk of Zaporizhzhya National University Physical and Mathematical Sciences