Abstract

To convey meaning, human language relies on hierarchically organized, long-range relationships spanning words, phrases, sentences and discourse. As the distances between elements (e.g. phonemes, characters, words) in human language sequences increase, the strength of the long-range relationships between those elements decays following a power law. This power-law relationship has been attributed variously to long-range sequential organization present in human language syntax, semantics and discourse structure. However, non-linguistic behaviours in numerous phylogenetically distant species, ranging from humpback whale song to fruit fly motility, also demonstrate similar long-range statistical dependencies. Therefore, we hypothesized that long-range statistical dependencies in human speech may occur independently of linguistic structure. To test this hypothesis, we measured long-range dependencies in several speech corpora from children (aged 6 months–12 years). We find that adult-like power-law statistical dependencies are present in human vocalizations at the earliest detectable ages, prior to the production of complex linguistic structure. These linguistic structures cannot, therefore, be the sole cause of long-range statistical dependencies in language.

Highlights

  • Since Shannon’s original work characterizing the sequential dependencies present in language, the structure underlying long-range information in language has been the subject of a great deal of interest in linguistics, statistical physics, cognitive science and psychology [1–20]

  • We explore the relationship between long-range statistical dependencies in speech and the emergence of hierarchical linguistic organization

  • We examined mutual information (MI) decay in sequences of words over nine datasets of natural speech from English speaking children included in the child’s age and length of the transcript (CHILDES) repository [77,82–89] and three datasets of sequences of phonemes from the PhonBank repository [76,78–80], both of which are part of the TalkBank repository [77]

Read more

Summary

Introduction

Since Shannon’s original work characterizing the sequential dependencies present in language, the structure underlying long-range information in language has been the subject of a great deal of interest in linguistics, statistical physics, cognitive science and psychology [1–20]. Across many different sequence types, including phonemes, syllables and words in both text and speech, the decay of long-range correlations and MI in language follows a power law (equation (2.6)) [2–14,18,19]. When viewed as an instance of this more general class of sequentially organized behaviour, one might reasonably predict that human speech should display long-range statistical dependencies independent of linguistic structure. We explore the relationship between long-range statistical dependencies in speech and the emergence of hierarchical linguistic organization (i.e. syntactic, semantic and discourse structure). We ask whether the long-range statistical dependencies present in speech originate alongside the presence of syntactically complex linguistic productions, or whether they precede the production of this specific form of hierarchical structure during development. We find that human speech exhibits long-range power-law statistical dependencies like those observed in mature human language early in development, at 6–12 months of age, while children are still in the ‘babbling’ stage of language development

Methods
Results
Findings
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.