Large Margin Training for Long Short-Term Memory Neural Networks in Neural Language Modeling

Xijuan Liu,Feng Hou,Junbo Ma,Zhizhong Ma

doi:10.1109/prai55851.2022.9904150

Abstract

Language models based on Long Short-Term Memory (LSTM) neural networks have been widely applied to automatic speech recognition, natural language processing research. However, current LSTMs are trained by employing the cross-entropy loss function, which only considers the target category without considering the competing categories during the training processes. Thus, current training methods cannot fully exploit the discriminative information provided by the data labels. To tackle this problem, we propose a Large Margin Long Short-Term Memory Neural Network (LMLSTM) model in this paper. Our model employs the large margin discriminative principle as a heuristic term to navigate the convergence process during training. Our model has improved the discriminative ability of the original LSTM while maintaining the capability when generating sequential data. Our proposed large margin term was tested on the Penn Treebank corpus language modelling task. Experimental results demonstrate that the proposed LMLSTM model outperforms current LSTM models in terms of accuracy and perplexity without increasing the depth.

Full Text