Abstract

We show that the mutual information between two symbols, as a function of the number of symbols between the two, decays exponentially in any probabilistic regular grammar, but can decay like a power law for a context-free grammar. This result about formal languages is closely related to a well-known result in classical statistical mechanics that there are no phase transitions in dimensions fewer than two. It is also related to the emergence of power law correlations in turbulence and cosmological inflation through recursive generative processes. We elucidate these physics connections and comment on potential applications of our results to machine learning tasks like training artificial recurrent neural networks. Along the way, we introduce a useful quantity, which we dub the rational mutual information, and discuss generalizations of our claims involving more complicated Bayesian networks.

Highlights

  • Critical behavior, where long-range correlations decay as a power law with distance, has many important physics applications ranging from phase transitions in condensed matter experiments to turbulence and inflationary fluctuations in our early Universe

  • As discussed in previous works [9,11,13], the plot shows that the number of bits of information provided by a symbol about another drops roughly as a power law

  • Just how generic is the scaling behavior of our model? What if the length of the words is not constant? What about more complex dependencies between layers? If we retrace the derivation in the above arguments, it becomes clear that the only key feature of all of our models considered so far is that the rational mutual information decays exponentially with the causal distance ∆: IR ∼ e−γ∆

Read more

Summary

Introduction

Critical behavior, where long-range correlations decay as a power law with distance, has many important physics applications ranging from phase transitions in condensed matter experiments to turbulence and inflationary fluctuations in our early Universe. All measured curves are seen to decay roughly as power laws, explaining why they cannot be accurately modeled as Markov processes, for which the mutual information instead plummets exponentially (the example shown has I ∝ e−d/6 ). We will show that computer descriptions of language suffer from a much simpler problem that has involved no talk about meaning or being non-human: they tend to get the basic statistical properties wrong To illustrate this point, consider Markov models of natural language. Linguistic arguments typically do not produce an observable that can be used to quantitatively falsify any Markovian model of language Instead, these arguments rely on highly specific knowledge about the data, in this case, an understanding of the language’s grammar.

Markov Implies Exponential Decay
Power Laws from Generative Grammar
A Simple Recursive Grammar Model
Further Generalization
Discussion
Connection to Recurrent Neural Networks
A New Diagnostic for Machine Learning
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.