Abstract

In a previous study, we introduced dynamical aspects of written texts by regarding serial sentence number from the first to last sentence of a given text as discretized time. Using this definition of a textual timeline, we defined an autocorrelation function (ACF) for word occurrences and demonstrated its utility both for representing dynamic word correlations and for measuring word importance within the text. In this study, we seek a stochastic process governing occurrences of a given word having strong dynamic correlations. This is valuable because words exhibiting strong dynamic correlations play a central role in developing or organizing textual contexts. While seeking this stochastic process, we find that additive binary Markov chain theory is useful for describing strong dynamic word correlations, in the sense that it can reproduce characteristics of autocovariance functions (an unnormalized version of ACFs) observed in actual written texts. Using this theory, we propose a model for time-varying probability that describes the probability of word occurrence in each sentence in a text. The proposed model considers hierarchical document structures such as chapters, sections, subsections, paragraphs, and sentences. Because such a hierarchical structure is common to most documents, our model for occurrence probability of words has a wide range of universality for interpreting dynamic word correlations in actual written texts. The main contributions of this study are, therefore, finding usability of the additive binary Markov chain theory to analyze dynamic correlations in written texts and offering a new model of word occurrence probability in which common hierarchical structure of documents is taken into account.

Highlights

  • Introducing the notion of time to written texts reveals dynamical aspects of word occurrences, allowing us to apply standard dynamical analyses developed and used in the fields of signal processing and time series analysis

  • We seek a stochastic process governing occurrences of a given word having strong dynamic correlations. This is valuable because words exhibiting strong dynamic correlations play a central role in developing or organizing textual contexts. While seeking this stochastic process, we find that additive binary Markov chain theory is useful for describing strong dynamic word correlations, in the sense that it can reproduce characteristics of autocovariance functions observed in actual written texts

  • We found that additive binary Markov chain theory is suited to this purpose because this theory can capture characteristic behaviors for dynamic correlations of Type-I words in actual written texts

Read more

Summary

Introduction

Introducing the notion of time to written texts reveals dynamical aspects of word occurrences, allowing us to apply standard dynamical analyses developed and used in the fields of signal processing and time series analysis. Type-I words are known to occur multiple times in a text in a bursty and context-specific manner, and such occurrences ensure that the word has a dynamic correlation. We found that additive binary Markov chain theory is suited to this purpose because this theory can capture characteristic behaviors for dynamic correlations of Type-I words in actual written texts To our knowledge, this is the first application of the theory of additive binary Markov chain to analyze written texts, the theory has been utilized to model natural phenomena such as wind generations [2]. This is the first application of the theory of additive binary Markov chain to analyze written texts, the theory has been utilized to model natural phenomena such as wind generations [2] Using this theory, we further calculated a time-varying probability that describes the occurrence probability of a given word as a function of time (i.e., sentence number). The ultimate goal of this study was construction of a recursive model for probability distributions, which is eas-

Ogura et al DOI
Examples of ACVFs for Type-I and Type-II Words
Necessity of an Additive Markov Chain
Framework of Additive Binary Markov Chain Theory
Memory Functions and Occurrence Probabilities of Type-I Words
Hierarchical Model of Probability Distribution for Word Occurrences
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.