Randomness Regularization With Simple Consistency Training for Neural Networks.

Juntao Li,Yue Wang,Min Zhang,Qi Meng,Lijun Wu,Tao Qin,Xiaobo Liang,Tie-Yan Liu

doi:10.1109/tpami.2024.3370716

Abstract

Randomness is widely introduced in neural network training to simplify model optimization or avoid the over-fitting problem. Among them, dropout and its variations in different aspects (e.g., data, model structure) are prevalent in regularizing the training of deep neural networks. Though effective and performing well, the randomness introduced by these dropout-based methods causes nonnegligible inconsistency between training and inference. In this paper, we introduce a simple consistency training strategy to regularize such randomness, namely R-Drop, which forces two output distributions sampled by each type of randomness to be consistent. Specifically, R-Drop minimizes the bidirectional KL-divergence between two output distributions produced by dropout-based randomness for each training sample. Theoretical analysis reveals that R-Drop can reduce the above inconsistency by reducing the inconsistency among the sampled sub structures and bridging the gap between the loss calculated by the full model and sub structures. Experiments on 7 widely-used deep learning tasks ( 23 datasets in total) demonstrate that R-Drop is universally effective for different types of neural networks (i.e., feed-forward, recurrent, and graph neural networks) and different learning paradigms (supervised, parameter-efficient, and semi-supervised). In particular, it achieves state-of-the-art performances with the vanilla Transformer model on WMT14 English → German translation ( 30.91 BLEU) and WMT14 English → French translation ( 43.95 BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Randomness Regularization With Simple Consistency Training for Neural Networks.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence

Lead the way for us

Journal: IEEE transactions on pattern analysis and machine intelligence	Publication Date: Aug 1, 2024
Citations: 1

Similar Papers

Scalable algorithms for physics-informed neural and graph networks
Khemraj Shukla ... George E Karniadakis
Data-Centric Engineering | VOL. 3
Khemraj Shukla, et. al.Khemraj Shukla ... George E Karniadakis
01 Jan 2021
Data-Centric Engineering | VOL. 3

Neuroevolution in Deep Neural Networks: Current Trends and Future Challenges
Edgar Galvan ... Peter Mooney
IEEE Transactions on Artificial Intelligence | VOL. 2
Edgar Galvan, et. al.Edgar Galvan ... Peter Mooney
04 May 2021
IEEE Transactions on Artificial Intelligence | VOL. 2

Self-Supervised Learning with Graph Neural Networks for Region of Interest Retrieval in Histopathology

-

29 Dec 2020
29 Dec 2020

Self-Supervised Learning with Graph Neural Networks for Region of Interest Retrieval in Histopathology
Yigit Ozen ... Sevgen Onder
-
Yigit Ozen, et. al.Yigit Ozen ... Sevgen Onder
10 Jan 2021
10 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Randomness Regularization With Simple Consistency Training for Neural Networks.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence