Abstract
Recurrent neural network language models (RNNLM) have become an increasingly popular choice for state-of-the-art speech recognition systems. RNNLMs are normally trained by minimizing the cross entropy (CE) using the stochastic gradient descent (SGD) algorithm. The SGD method only uses first-order derivatives and no higher order gradient information is used to consider the correlation between model parameters. It is unable to fully capture the curvature of the error cost function. This can lead to slow convergence in model training. In this paper, a limited-memory Broyden Fletcher Goldfarb Shannon (L-BFGS) based second order optimization technique is proposed for RNNLMs. This method efficiently approximates the matrix-vector product between the inverse Hessian and gradient vector via a recursion over past gradients with a compact memory requirement. Consistent perplexity and error rate reductions are obtained over the SGD method on two speech recognition tasks: Switchboard English and Babel Cantonese. A faster convergence and speed up in RNNLM training time was also obtained. Index Terms: recurrent neural network, language model, second order optimization, limited-memory BFGS, speech recognition
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.