Abstract
AbstractSince there are no systematic pauses delimiting words in speech, the problem of word segmentation is formidable even for monolingual infants. We use computational modeling to assess whether word segmentation is substantially harder in a bilingual than a monolingual setting. Seven algorithms representing different cognitive approaches to segmentation are applied to transcriptions of naturalistic input to young children, carefully processed to generate perfectly matched monolingual and bilingual corpora. We vary the overlap in phonology and lexicon experienced by modeling exposure to languages that are more similar (Catalan and Spanish) or more different (English and Spanish). We find that the greatest variation in performance is due to different segmentation algorithms and the second greatest to language, with bilingualism having effects that are smaller than both algorithm and language effects. Implications of these computational results for experimental and modeling approaches to language acquisition are discussed.
Highlights
Unlike in written language, there are no spaces between words when we speak
We know infants must have found a solution to this difficult problem because they know the meaning of some words by 6 months (Tincoff & Jusczyk, 1999, 2012), and they must have been able to learn at least those phonological sequences or word forms
The question of how infants approach the problem of word segmentation has been the focus of intensive cross-disciplinary research in the last years, combining experimental studies on infants and adults, mostly on monolinguals, and computational modeling
Summary
There are no spaces between words when we speak. There are no obvious and infallible cues that indicate word boundaries (e.g., Brent & Siskind, 2001). We know infants must have found a solution to this difficult problem because they know the meaning of some words by 6 months (Tincoff & Jusczyk, 1999, 2012), and they must have been able to learn at least those phonological sequences or word forms. Some evidence suggests that infants do not wait to learn true words (i.e., form-meaning pairings), but instead start segmenting their input and memorizing high frequency sequences as early as 6 months, to the point that they accumulate a proto-lexicon of about 500 word In the rest of this Introduction, we briefly introduce the problem of word segmentation and review other previous interdisciplinary research, before turning to our unique contributions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have