Abstract


 Context-free grammars (CFG) were one of the first formal tools used to model natural languages, and they remain relevant today as the basis of several frameworks. A key ingredient of CFG is the presence of nested recursion. In this paper, we investigate experimentally the capability of several recurrent neural networks (RNNs) to learn nested recursion. More precisely, we measure an upper bound of their capability to do so, by simplifying the task to learning a generalized Dyck language, namely one composed of matching parentheses of various kinds. To do so, we present the RNNs with a set of random strings having a given maximum nesting depth and test its ability to predict the kind of closing parenthesis when facing deeper nested strings. We report mixed results: when generalizing to deeper nesting levels, the accuracy of standard RNNs is significantly higher than random, but still far from perfect. Additionally, we propose some non-standard stack-based models which can approach perfect accuracy, at the cost of robustness.

Highlights

  • In many settings, Recurrent Neural Networks (RNNs) act as generative language models

  • The long short-term memory (LSTM) shows near perfect accuracy for all known strings

  • We have found out that, by and large, the LSTM is capable of generalization to new depths, with a respectable, yet not nearly perfect accuracy

Read more

Summary

Introduction

Recurrent Neural Networks (RNNs) act as generative language models. Popular RNNs functioning on these principles include the long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997), and the gated recurrent unit (GRU) (Cho et al, 2014) Thanks to their versatility, relative ease of training and ability to model long-term dependencies, RNNs have become the leading tool for natural language processing. Even experienced computational linguists use words such as “amazing” or even “magic” to describe them, betraying that it remains mysterious how, by performing arithmetic operations, the RNN can effectively mimic human linguistic production (Karpathy, 2016) This combination of poor understanding and enthusiasm may lead less experienced researchers into believing that the capabilities of LSTM RNNs are limitless, and that, with enough data, they can model any language you throw at them.

LSTM We use the variant of the LSTM RNN defined by the following equations:
Generalized-Dyck Language
Interpretation
Subtask A
Results
Subtask B
Related work
Learnability of depth recursion
Suitability of RNN variants
Deep recursion in natural language
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.