Abstract

Despite the tremendous empirical success of neural models in natural language processing, many of them lack the strong intuitions that accompany classical machine learning approaches. Recently, connections have been shown between convolutional neural networks (CNNs) and weighted finite state automata (WFSAs), leading to new interpretations and insights. In this work, we show that some recurrent neural networks also share this connection to WFSAs. We characterize this connection formally, defining rational recurrences to be recurrent hidden state update functions that can be written as the Forward calculation of a finite set of WFSAs. We show that several recent neural models use rational recurrences. Our analysis provides a fresh view of these models and facilitates devising new neural architectures that draw inspiration from WFSAs. We present one such model, which performs better than two recent baselines on language modeling and text classification. Our results demonstrate that transferring intuitions from classical models like WFSAs can be an effective approach to designing and understanding neural models.

Highlights

  • Neural models, and in particular gated variants of recurrent neural networks (RNNs, e.g., Hochreiter and Schmidhuber, 1997; Cho et al, 2014), have become a core building block for stateof-the-art approaches in NLP (Goldberg, 2016)

  • In this work we show that many neural models are more interpretable than previously

  • We present a new model motivated by the interpolation of a two-state weighted finite-state automaton (WFSA) and a threestate one, capturing unigram and bigram features, respectively

Read more

Summary

Introduction

In particular gated variants of recurrent neural networks (RNNs, e.g., Hochreiter and Schmidhuber, 1997; Cho et al, 2014), have become a core building block for stateof-the-art approaches in NLP (Goldberg, 2016). While these models empirically outperform classical NLP methods on many tasks (Zaremba et al, 2014; Bahdanau et al, 2015; Dyer et al, 2016; Peng et al, 2017, inter alia), they typically lack the intuition offered by classical models, making it hard to understand the roles played by each of their components. We study several recently proposed RNN architectures and show that one can use WFSAs to characterize their recurrent updates We call such models rational recurrences (§3)..

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.