Abstract

Dense corpora have been put forward as necessary tools for corpus studies of language acquisition. Despite their great interest, they are not yet frequently used, probably because of the high cost involved in their creation. The goal of the present study was to predict the optimal size of a dense longitudinal corpus when used to infer, manually or automatically, the details of lexical or syntactic development in child language. The results show that corpora of at least 30 to 40 one-hour recordings are necessary, but that longer corpora using the same protocol provide little new information. Dense corpora are indeed very useful, but do not need to be overly large to study grammatical development. This has important consequences for corpus-building projects, which can be optimized. The existence of a limit to the amount of information provided by large corpora also has important consequences for linguistic theory, as this helps locate the threshold between learning frozen forms and generalizing knowledge about language structure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call