Enhancing performance of transformer-based models in natural language understanding through word importance embedding

Seung-Kyu Hong,Jae-Seok Jang,Hyuk-Yoon Kwon

doi:10.1016/j.knosys.2024.112404

Abstract

Transformer-based models have achieved state-of-the-art performance on natural language understanding (NLU) tasks by learning important token relationships through the attention mechanism. However, we observe that attention can become overly distributed during fine-tuning, failing to preserve the dependencies between meaningful tokens adequately. This phenomenon negatively affects the learning of token relationships in sentences. To overcome this issue, we propose a methodology that embeds the feature of word importance (WI) in the transformer-based models as a new layer, weighting the words according to their importance. Our simple yet powerful approach offers a general technique to boost transformer model capabilities on NLU tasks by mitigating the risk of attention dispersion during fine-tuning. Through extensive experiments on GLUE, SuperGLUE, and SQuAD benchmarks for pre-trained models (BERT, RoBERTa, ELECTRA, and DeBERTa), and MMLU, Big Bench Hard, and DROP benchmarks for the large language model, Llama2, we validate the effectiveness of our method in consistently enhancing performance across models with negligible overhead. Furthermore, we validate that our WI layer better preserves the dependencies between important tokens than standard fine-tuning by introducing a model classifying dependent tokens from the learned attention weights. The code is available at https://github.com/bigbases/WordImportance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing performance of transformer-based models in natural language understanding through word importance embedding

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Similar Papers

Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models
Eunchan Lee ... Sangtae Ahn
Applied Sciences | VOL. 12
Eunchan Lee, et. al.Eunchan Lee ... Sangtae Ahn
29 Apr 2022
Applied Sciences | VOL. 12

KLEJ: Comprehensive Benchmark for Polish Language Understanding
Piotr Rybak ... Ireneusz Gawlik
-
Piotr Rybak, et. al.Piotr Rybak ... Ireneusz Gawlik
01 Jan 2020
01 Jan 2020

SeqGPT: An Out-of-the-Box Large Language Model for Open Domain Sequence Understanding
Tianyu Yu ... Chao Lou
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Tianyu Yu, et. al.Tianyu Yu ... Chao Lou
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Testing and Evaluation of Health Care Applications of Large Language Models
Suhana Bedi ... Nigam H Shah
JAMA | VOL. 333
Suhana Bedi, et. al.Suhana Bedi ... Nigam H Shah
15 Oct 2024
JAMA | VOL. 333

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing performance of transformer-based models in natural language understanding through word importance embedding

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems