Abstract
Increasing the capacity of recurrent neural networks (RNN) usually involves augmenting the size of the hidden layer, with significant increase of computational cost. Recurrent neural tensor networks (RNTN) increase capacity using distinct hidden layer weights for each word, but with greater costs in memory usage. In this paper, we introduce restricted recurrent neural tensor networks (r-RNTN) which reserve distinct hidden layer weights for frequent vocabulary words while sharing a single set of weights for infrequent words. Perplexity evaluations show that for fixed hidden layer sizes, r-RNTNs improve language model performance over RNNs using only a small fraction of the parameters of unrestricted RNTNs. These results hold for r-RNTNs using Gated Recurrent Units and Long Short-Term Memory.
Highlights
Recurrent neural networks (RNN), which compute their output conditioned on a previously stored hidden state, are a natural solution to sequence modeling. Mikolov et al (2010) applied RNNs to word-level language modeling, outperforming traditional n-gram methods
We focus on related work that addresses language modeling via RNNs, word representation, and conditional computation
With H = 100, as model capacity grows with K, test set perplexity drops, showing that Recurrent Neural Tensor Networks (rRNTN) is an effective way to increase model capacity with no additional computational cost
Summary
Recurrent neural networks (RNN), which compute their output conditioned on a previously stored hidden state, are a natural solution to sequence modeling. Mikolov et al (2010) applied RNNs to word-level language modeling (we refer to this model as s-RNN), outperforming traditional n-gram methods. Sutskever et al (2011) increased the performance of a character-level language model with a multiplicative RNN (m-RNN), the factored approximation of a recurrent neural tensor network (RNTN), which maps each symbol to separate hidden layer weights (referred to as recurrence matrices from hereon). Having separate recurrence matrices for each symbol requires memory that is linear in the symbol vocabulary size (|V |). This is not an issue for character-level models, which have small vocabularies, but is prohibitive for word-level models which can have vocabulary size in the millions if we consider surface forms
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.