Abstract

The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf’s law and Heaps’ law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf’s law and Heaps’ law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.

Highlights

  • Deep learning has performed spectacularly in various natural language processing tasks such as machine translation [1], text summarization [2], dialogue systems [3], and question answering [4]

  • We have found that two well acknowledged statistical laws of natural language—Zipf’s law [12] and Heaps’ law [13] [14] [15]—almost hold for the pseudo-text generated by a neural language model

  • The stacked long short-term memory (LSTM) can reproduce the power-law behavior of the rank-frequency distribution of long n-grams. These results indicate that a neural language model can learn the statistical laws behind natural language, and that the stacked LSTM is especially capable of reproducing both patterns of n-grams and the properties of vocabulary growth

Read more

Summary

Introduction

Deep learning has performed spectacularly in various natural language processing tasks such as machine translation [1], text summarization [2], dialogue systems [3], and question answering [4]. We have found that two well acknowledged statistical laws of natural language—Zipf’s law [12] and Heaps’ law [13] [14] [15]—almost hold for the pseudo-text generated by a neural language model. This finding is notable because previous language models, such as Markov models, cannot reproduce such properties, and mathematical models, which are designed to reproduce statistical laws [16] [17], are limited in their purpose. The analyses described in this paper contribute to our understanding of the performance of neural networks and provide guidance as to how we can improve models

Neural language model
The Emergence of Zipf’s law and Heaps’ law
Neural language models are limited in reproducing long-range correlation
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call