Binarized LSTM Language Model

Xuan Liu,Kai Yu,Di Cao

doi:10.18653/v1/n18-1192

Abstract

Long short-term memory (LSTM) language model (LM) has been widely investigated for automatic speech recognition (ASR) and natural language processing (NLP). Although excellent performance is obtained for large vocabulary tasks, tremendous memory consumption prohibits the use of LSTM LM in low-resource devices. The memory consumption mainly comes from the word embedding layer. In this paper, a novel binarized LSTM LM is proposed to address the problem. Words are encoded into binary vectors and other LSTM parameters are further binarized to achieve high memory compression. This is the first effort to investigate binary LSTM for large vocabulary LM. Experiments on both English and Chinese LM and ASR tasks showed that can achieve a compression ratio of 11.3 without any loss of LM and ASR performances and a compression ratio of 31.6 with acceptable minor performance degradation.

Highlights

Language models (LMs) play an important role in natural language processing (NLP) tasks
For traditional recurrent neural network (RNN) based language models, the memory consumption mainly comes from the embedding layers
Since Penn TreeBank (PTB) is a relatively small dataset and the convergence rates of the binarized embedding language model (BELM) and the binarized LSTM language model (BLLM) are slower than long short-term memory (LSTM) language model, we reduce the learning rate by half every three epochs if the perplexity on the validation set is not reduced

Summary

Introduction

Language models (LMs) play an important role in natural language processing (NLP) tasks. Recurrent neural network (RNN) based models are widely used on natural language processing (NLP) tasks for excellent performance (Mikolov et al, 2010). Some gate based structures, such as long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997) and gated recurrent unit (GRU) (Chung et al, 2014) improve the recurrent structures and achieve state-of-the-art performance on most NLP tasks. The word embedding parameters are floating point values, which adds to the memory consumption. The first contribution in this paper is that a novel language model, the binarized embedding language model (BELM) is proposed to reduce the memory consumption. The consumption of memory space is significantly reduced Another contribution in the paper is that we binarize the LSTM language model combined with the binarized embeddings to further compress the parameter space.

Related Work

LSTM Language Model

Binarized Embedding Language Model

Binarized LSTM Language Model

Memory Reduction

Experimental Setup

Experiments in Language Modeling

Experiments on ASR Rescoring Tasks

Investigation of Binarized Embeddings

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Binarized LSTM Language Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 36	License type: cc-by

Similar Papers

Future vector enhanced LSTM language model for LVCSR
Qi Liu ... Yanmin Qian
-
Qi Liu, et. al.Qi Liu ... Yanmin Qian
01 Dec 2017
01 Dec 2017

Modeling Non-Linguistic Contextual Signals in LSTM Language Models Via Domain Adaptation
Min Ma ... Fadi Biadsy
-
Min Ma, et. al.Min Ma ... Fadi Biadsy
01 Apr 2018
01 Apr 2018

Comparison of Data Augmentation and Adaptation Strategies for Code-switched Automatic Speech Recognition
Min Ma ... Jesse Emond
-
Min Ma, et. al.Min Ma ... Jesse Emond
01 May 2019
01 May 2019

Efficient Transfer Learning for Neural Network Language Models
Jacek Skryzalin ... Richard Field
-
Jacek Skryzalin, et. al.Jacek Skryzalin ... Richard Field
01 Aug 2018
01 Aug 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Binarized LSTM Language Model

Abstract

Highlights

Summary

Talk to us

Similar Papers