Language Modeling Tasks Research Articles

Augmenting a neural network with memory that can grow without growing the number of trained parameters is a recent powerful concept with many exciting applications. In this paper, we establish their potential in online adapting a batch trained neural network to domain-relevant labeled data at deployment time. We present the design of Labeled Memory Network (LMN), a new memory augmented neural network (MANN) for fast online model adaptation. We highlight three key features of LMNs. First, LMNs treat memory as a second boosted stage following the trained network thereby allowing the memory and network to play complementary roles. Unlike all existing MANNs that write to memory at every cycle, LMNs provide better memory utilization by writing only labeled data with non-zero loss. Second, LMNs organize the memory with the discrete class label as the primary key unlike existing MANNs where key is a real vector derived from the input. This simple, yet surprisingly unexplored alternative organization, safeguards against catastrophic forgetting of rare labels that current LRU based MANNs are subject to. Finally, LMNs model the evolving expertise of memory and network using a RNN, to determine online their respective weights we evaluate online model adaptation strategies on five sequence prediction tasks, an image classification task, and two language modeling tasks. We show that LMNs are better than other MANNs designed for meta-learning. We also found them to be more accurate and faster than state-of-the-art methods of retuning model parameters for adapting to domain-specific labeled data.

Recurrent neural networks (RNNs) have achieved the state-of-the-art performance on various sequence learning tasks due to their powerful sequence modeling capability. However, RNNs usually require a large number of parameters and high computational complexity. Hence, it is quite challenging to implement complex RNNs on embedded devices with stringent memory and latency requirement. In this paper, we first present a novel hybrid compression method for a widely used RNN variant, long-short term memory (LSTM), to tackle these implementation challenges. By properly using circulant matrices, forward nonlinear function approximation, and efficient quantization schemes with a retrain-based training strategy, the proposed compression method can reduce more than 95% of memory usage with negligible accuracy loss when verified under language modeling and speech recognition tasks. An efficient scalable parallel hardware architecture is then proposed for the compressed LSTM. With an innovative chessboard division method for matrix-vector multiplications, the parallelism of the proposed hardware architecture can be freely chosen under certain latency requirement. Specifically, for the circulant matrix-vector multiplications employed in the compressed LSTM, the circulant matrices are judiciously reorganized to fit in with the chessboard division and minimize the number of memory accesses required for the matrix multiplications. The proposed architecture is modeled using register transfer language (RTL) and synthesized under the TSMC 90-nm CMOS technology. With 518.5-kB on-chip memory, we are able to process a 512×512 compressed LSTM in 1.71 μs, corresponding to 2.46 TOPS on the uncompressed one, at a cost of 30.77-mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> chip area. The implementation results demonstrate that the proposed design can achieve significantly high flexibility and area efficiency, which satisfies many real-time applications on embedded devices. It is worth mentioning that the memory-efficient approach of accelerating LSTM developed in this paper is also applicable to other RNN variants.

Language Modeling Tasks Research Articles

Related Topics

Articles published on Language Modeling Tasks

Hierarchical recurrent highway networks

Residual Recurrent Highway Networks for Learning Deep Sequence Prediction Models

Labeled Memory Networks for Online Model Adaptation

Об’єднання вбудовувань для покращення регуляризації нейронних мереж для задачі розпізнавання іменованих сутностей

Accelerating Recurrent Neural Networks: A Memory-Efficient Approach

Global context-dependent recurrent neural network language model with sparse feature learning

Learning Composition Models for Phrase Embeddings

Dynamic Language Models for Streaming Text

A study of smoothing algorithms for item categorization on e-commerce sites

Incremental Language Modeling for Automatic Transcription of Broadcast News

Similarity-Based Models of Word Cooccurrence Probabilities

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Language Modeling Tasks Research Articles

Related Topics

Articles published on Language Modeling Tasks

Hierarchical recurrent highway networks

Residual Recurrent Highway Networks for Learning Deep Sequence Prediction Models

Labeled Memory Networks for Online Model Adaptation

Об’єднання вбудовувань для покращення регуляризації нейронних мереж для задачі розпізнавання іменованих сутностей

Accelerating Recurrent Neural Networks: A Memory-Efficient Approach

Global context-dependent recurrent neural network language model with sparse feature learning

Learning Composition Models for Phrase Embeddings

Dynamic Language Models for Streaming Text

A study of smoothing algorithms for item categorization on e-commerce sites

Incremental Language Modeling for Automatic Transcription of Broadcast News

Similarity-Based Models of Word Cooccurrence Probabilities