Abstract

Child language acquisition is often identified as one of the primary drivers of language change, but the lack of historical child data presents a challenge for empirically investigating its effect. In this work, I observe the relationship between lexicons extracted from modern child-directed speech and those drawn from modern and historical literary corpora in order to better understand when language acquisition can be modeled over historical and non-child corpora as it is over child corpora. The type frequencies of morphophonological and syntactic-semantic patterns occur at similar type frequencies in these corpora among high token frequency items, and furthermore, when a learning algorithm is applied to lexicons sampled from these sources, it consistently achieves the same learning outcomes in each. With appropriate care and pre-processing, modern and historical text corpora are effectively interchangeable with child-directed speech corpora for the purpose of estimating child lexical experience, opening a path for modeling language acquisition where child-directed corpora are not available.

Highlights

  • The advent of child-directed speech (CDS) corpora in recent decades containing years’ worth of early linguistic input (e.g., CHILDES; MacWhinney 2000) has facilitated significant progress in the field of native language acquisition

  • 2.3 Interim conclusions These studies show that type frequencies in corpora derived from child-directed speech are statistically similar to frequency-trimmed corpora derived from adult literary genres even though they differ in their specific lexical contents

  • Adult corpora may be reasonably substituted for CDS corpora for the purpose of modeling grammar learning in child language acquisition, since it is these type frequencies that are directly relevant and frequency trimming is just a normal step for approximating child vocabulary size and composition when analyzing CDS for productivity

Read more

Summary

Introduction

The advent of child-directed speech (CDS) corpora in recent decades containing years’ worth of early linguistic input (e.g., CHILDES; MacWhinney 2000) has facilitated significant progress in the field of native language acquisition. The contribution of this paper is methodological: I establish that, despite the differences that intuitively exist, CDS and modern and historical non-CDS corpora are fundamentally similar along dimensions relevant for native language acquisition. Lexical variability between CDS corpora reflects the real-world variation in early linguistic experience that leads to precociousness or delays among learners (Maratsos 2000; Yang 2002). Doing so yields approximations of “typical” children’s lexicons which are the right size and consist primarily of high frequency items It is these properties that make corpora of child directed speech such useful resources for studying grammar learning. I begin in Section 2 by illustrating the effect that trimming low token frequency items has on CDS and adult corpora in Modern English This is extended to historical corpora, where I compare semantic overlap between cross-linguistic modern CDS and historical lexicons.

Verbal lexicons derived from child-directed speech and adult corpora
Deploying an acquisition model
Modern English Past -ed
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call