Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality

Alexandre Salle,Aline Villavicencio

doi:10.18653/v1/p18-2002

Abstract

Increasing the capacity of recurrent neural networks (RNN) usually involves augmenting the size of the hidden layer, with significant increase of computational cost. Recurrent neural tensor networks (RNTN) increase capacity using distinct hidden layer weights for each word, but with greater costs in memory usage. In this paper, we introduce restricted recurrent neural tensor networks (r-RNTN) which reserve distinct hidden layer weights for frequent vocabulary words while sharing a single set of weights for infrequent words. Perplexity evaluations show that for fixed hidden layer sizes, r-RNTNs improve language model performance over RNNs using only a small fraction of the parameters of unrestricted RNTNs. These results hold for r-RNTNs using Gated Recurrent Units and Long Short-Term Memory.

Highlights

Recurrent neural networks (RNN), which compute their output conditioned on a previously stored hidden state, are a natural solution to sequence modeling. Mikolov et al (2010) applied RNNs to word-level language modeling, outperforming traditional n-gram methods
We focus on related work that addresses language modeling via RNNs, word representation, and conditional computation
With H = 100, as model capacity grows with K, test set perplexity drops, showing that Recurrent Neural Tensor Networks (rRNTN) is an effective way to increase model capacity with no additional computational cost

Summary

Introduction

Recurrent neural networks (RNN), which compute their output conditioned on a previously stored hidden state, are a natural solution to sequence modeling. Mikolov et al (2010) applied RNNs to word-level language modeling (we refer to this model as s-RNN), outperforming traditional n-gram methods. Sutskever et al (2011) increased the performance of a character-level language model with a multiplicative RNN (m-RNN), the factored approximation of a recurrent neural tensor network (RNTN), which maps each symbol to separate hidden layer weights (referred to as recurrence matrices from hereon). Having separate recurrence matrices for each symbol requires memory that is linear in the symbol vocabulary size (|V |). This is not an issue for character-level models, which have small vocabularies, but is prohibitive for word-level models which can have vocabulary size in the millions if we consider surface forms

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 14	License type: cc-by

Similar Papers

Improved recurrent NARX neural network model for state of charge estimation of lithium-ion battery using pso algorithm
M S Hossain Lipu ... A Ayob
-
M S Hossain Lipu, et. al.M S Hossain Lipu ... A Ayob
01 Apr 2018
01 Apr 2018

Editor's evaluation: Neural population dynamics of computing with synaptic modulations
Gianluigi Mongillo
-
Gianluigi MongilloGianluigi Mongillo
08 Jan 2023
08 Jan 2023

Author response: Neural population dynamics of computing with synaptic modulations
Kyle Aitken ... Stefan Mihalas
-
Kyle Aitken, et. al.Kyle Aitken ... Stefan Mihalas
10 Feb 2023
10 Feb 2023

Decision letter: Neural population dynamics of computing with synaptic modulations
Omri Barak ... Joshua I Gold
-
Omri Barak, et. al.Omri Barak ... Joshua I Gold
08 Jan 2023
08 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality

Abstract

Highlights

Summary

Talk to us

Similar Papers