G2Basy: A framework to improve the RNN language model and ease overfitting problem

Lu Yuwen,Xiaohan Yuan,Shuyu Chen

doi:10.1371/journal.pone.0249820

Abstract

Recurrent neural networks are efficient ways of training language models, and various RNN networks have been proposed to improve performance. However, with the increase of network scales, the overfitting problem becomes more urgent. In this paper, we propose a framework—G2Basy—to speed up the training process and ease the overfitting problem. Instead of using predefined hyperparameters, we devise a gradient increasing and decreasing technique that changes the parameters training batch size and input dropout simultaneously by a user-defined step size. Together with a pretrained word embedding initialization procedure and the introduction of different optimizers at different learning rates, our framework speeds up the training process dramatically and improves performance compared with a benchmark model of the same scale. For the word embedding initialization, we propose the concept of “artificial features” to describe the characteristics of the obtained word embeddings. We experiment on two of the most often used corpora—the Penn Treebank and WikiText-2 datasets—and both outperform the benchmark results and show potential towards further improvement. Furthermore, our framework shows better results with the larger and more complicated WikiText-2 corpus than with the Penn Treebank. Compared with other state-of-the-art results, we achieve comparable results with network scales hundreds of times smaller and within fewer training epochs.

Highlights

Natural language processing (NLP) is the area of artificial intelligence that concerns the automatic generation and understanding of human languages [1]
Language models are an essential part of NLP that can predict upcoming words based on a given context [2]
ASGD still works if we introduce it at a learning rate of 0.3125, but the training soon begins to overfit after a few epochs

Summary

Introduction

Natural language processing (NLP) is the area of artificial intelligence that concerns the automatic generation and understanding of human languages [1]. To alleviate the overfitting problem and enhance the generalization ability of language models, mechanisms like tied weights [12], dropout [13], and a vast variety of optimization algorithms, such as Momentum [14], Adadelta [15], and Adam [16], have been proposed These techniques do not work well on RNNs, especially on LSTM networks [17], which are designed to solve long time lag tasks. It uses the pretrained GloVe word embeddings to initialize its input vectors and changes optimization algorithms during training. Compared with other state-of-the-art regularized multilayer RNN models with much larger scales, our framework still achieves close results

Materials and methods

Experiments datasets

Accumulate Updates

Results and discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Apr 14, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

G2Basy: A framework to improve the RNN language model and ease overfitting problem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Character n-Gram Embeddings to Improve RNN Language Models
Sho Takase ... Masaaki Nagata
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33
Sho Takase, et. al.Sho Takase ... Masaaki Nagata
17 Jul 2019
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33

Recurrent neural network with attention mechanism for language model
Mu-Yen Chen ... Tsung-Che Hsieh
Neural Computing and Applications | VOL. 32
Mu-Yen Chen, et. al.Mu-Yen Chen ... Tsung-Che Hsieh
21 Jun 2019
Neural Computing and Applications | VOL. 32

Recurrent neural network-based language models with variation in net topology, language, and granularity
Tzu-Hsuan Yang ... Chia-Ping Chen
-
Tzu-Hsuan Yang, et. al.Tzu-Hsuan Yang ... Chia-Ping Chen
01 Nov 2016
01 Nov 2016

Verifying the long-range dependency of RNN language models
Tzu-Hsuan Tseng ... Tzu-Hsuan Yang
-
Tzu-Hsuan Tseng, et. al.Tzu-Hsuan Tseng ... Tzu-Hsuan Yang
01 Nov 2016
01 Nov 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

G2Basy: A framework to improve the RNN language model and ease overfitting problem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE