AMPLIFY: attention-based mixup for performance improvement and label smoothing in transformer

Leixin Yang,Yu Xiang

doi:10.7717/peerj-cs.2011

Abstract

Mixup is an effective data augmentation method that generates new augmented samples by aggregating linear combinations of different original samples. However, if there are noises or aberrant features in the original samples, mixup may propagate them to the augmented samples, leading to over-sensitivity of the model to these outliers. To solve this problem, this paper proposes a new mixup method called AMPLIFY. This method uses the attention mechanism of Transformer itself to reduce the influence of noises and aberrant values in the original samples on the prediction results, without increasing additional trainable parameters, and the computational cost is very low, thereby avoiding the problem of high resource consumption in common mixup methods such as Sentence Mixup. The experimental results show that, under a smaller computational resource cost, AMPLIFY outperforms other mixup methods in text classification tasks on seven benchmark datasets, providing new ideas and new ways to further improve the performance of pre-trained models based on the attention mechanism, such as BERT, ALBERT, RoBERTa, and GPT. Our code can be obtained at https://github.com/kiwi-lilo/AMPLIFY.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AMPLIFY: attention-based mixup for performance improvement and label smoothing in transformer

Abstract

Talk to us

Similar Papers

More From: PeerJ Computer Science

Lead the way for us

Journal: PeerJ Computer Science	Publication Date: Apr 30, 2024
License type: CC BY 4.0

Similar Papers

Confidence-aware calibration and scoring functions for curriculum learning
Shuang Ao ... Advaith Siddharthan
-
Shuang Ao, et. al.Shuang Ao ... Advaith Siddharthan
07 Jun 2023
07 Jun 2023

Semi-supervised Text Classification Based On Graph Attention Neural Networks
Jian Huang ... Jing Wang
-
Jian Huang, et. al.Jian Huang ... Jing Wang
28 May 2021
28 May 2021

BPAM: Recommendation Based on BP Neural Network with Attention Mechanism
Wu-Dong Xi ... Chang-Dong Wang
-
Wu-Dong Xi, et. al.Wu-Dong Xi ... Chang-Dong Wang
01 Aug 2019
01 Aug 2019

Add a SideNet to your MainNet
Adrien Morisot
Proceedings of the Northern Lights Deep Learning Workshop | VOL. 3
Adrien MorisotAdrien Morisot
28 Mar 2022
Proceedings of the Northern Lights Deep Learning Workshop | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AMPLIFY: attention-based mixup for performance improvement and label smoothing in transformer

Abstract

Talk to us

Similar Papers

More From: PeerJ Computer Science