Abstract

To acquire language, infants must learn to segment words from running speech. A significant body of experimental research shows that infants use multiple cues to do so; however, little research has comprehensively examined the distribution of such cues in naturalistic speech. We conducted a comprehensive corpus analysis of German child-directed speech (CDS) using data from the Child Language Data Exchange System (CHILDES) database, investigating the availability of word stress, transitional probabilities (TPs), and lexical and sublexical frequencies as potential cues for word segmentation. Seven hours of data (~15,000 words) were coded, representing around an average day of speech to infants. The analysis revealed that for 97% of words, primary stress was carried by the initial syllable, implicating stress as a reliable cue to word onset in German CDS. Word identity was also marked by TPs between syllables, which were higher within than between words, and higher for backwards than forwards transitions. Words followed a Zipfian-like frequency distribution, and over two-thirds of words (78%) were monosyllabic. Of the 50 most frequent words, 82% were function words, which accounted for 47% of word tokens in the entire corpus. Finally, 15% of all utterances comprised single words. These results give rich novel insights into the availability of segmentation cues in German CDS, and support the possibility that infants draw on multiple converging cues to segment their input. The data, which we make openly available to the research community, will help guide future experimental investigations on this topic.

Highlights

  • One of the first puzzles that children must solve during language acquisition is finding boundaries between individual words in speech

  • Building upon past research that has focused on single prominent cues to segmentation (e.g., word stress: Cutler & Carter, 1987; transitional probabilities (TPs): Saksida et al, 2017; and single-word utterances: Brent & Siskind, 2001), we provide a rare comprehensive assessment of a broad range of cues that have been shown to help learners to locate word boundaries in speech, giving a rich overview of the way these cues exist in German childdirected speech (CDS)

  • We examined the position of primary within-word stress, to establish how reliable the widely assumed dominant trochaic stress pattern is as a potential cue for segmentation in German

Read more

Summary

Introduction

One of the first puzzles that children must solve during language acquisition is finding boundaries between individual words in speech This is no easy feat, since there are no perfectly reliable cues that learners can draw upon (Aslin et al, 1996; Lehiste, 1970). Building upon past research that has focused on single prominent cues to segmentation (e.g., word stress: Cutler & Carter, 1987; transitional probabilities (TPs): Saksida et al, 2017; and single-word utterances: Brent & Siskind, 2001), we provide a rare comprehensive assessment of a broad range of cues that have been shown to help learners to locate word boundaries in speech, giving a rich overview of the way these cues exist in German CDS.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call