Abstract

Taylor’s law describes the fluctuation characteristics underlying a complex system in which the variance of an event within a time span grows by a power law with respect to the mean. Although Taylor’s law has been applied in many natural and social systems, its application for language has been scarce. This article describes a new, natural way to apply Taylor analysis to texts. The method was applied to over 1100 texts across 14 languages and showed how the Taylor exponents of natural language written texts were consistently around 0.58, thus being universal. The exponents were also evaluated for other language-related data, such as speech corpora (0.63 for adult speech, 0.68 for child-directed speech), programming language sources (0.79), and music (0.79). The results show how the Taylor exponent serves to quantify the fundamental structural complexity underlying linguistic time series. To explain the nature of natural language sequences possessing such different degrees of fluctuation, we investigated various mathematical models that could produce a Taylor exponent similar to that of real data. While the majority of previous probabilistic sequential models could not produce a Taylor exponent larger than 0.50, the same as in the independent and identically distributed (i.i.d.) case, random walk sequences on complex networks could produce fluctuation. We show that among various possibilities, random walks on a Barabási-Albert (BA) graph with small mean degree could fulfill the scaling properties of Zipf’s law and the long-range correlation, in addition to having a Taylor’s law exponent larger than 0.5, thus giving a new perspective to reconsider the nature of language.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.