Is there a bilingual disadvantage for word segmentation? A computational modeling approach

Laia Fibla,Alejandrina Cristia,Nuria Sebastian-Galles

doi:10.1017/s0305000921000568

Abstract

AbstractSince there are no systematic pauses delimiting words in speech, the problem of word segmentation is formidable even for monolingual infants. We use computational modeling to assess whether word segmentation is substantially harder in a bilingual than a monolingual setting. Seven algorithms representing different cognitive approaches to segmentation are applied to transcriptions of naturalistic input to young children, carefully processed to generate perfectly matched monolingual and bilingual corpora. We vary the overlap in phonology and lexicon experienced by modeling exposure to languages that are more similar (Catalan and Spanish) or more different (English and Spanish). We find that the greatest variation in performance is due to different segmentation algorithms and the second greatest to language, with bilingualism having effects that are smaller than both algorithm and language effects. Implications of these computational results for experimental and modeling approaches to language acquisition are discussed.

Highlights

Unlike in written language, there are no spaces between words when we speak
We know infants must have found a solution to this difficult problem because they know the meaning of some words by 6 months (Tincoff & Jusczyk, 1999, 2012), and they must have been able to learn at least those phonological sequences or word forms
The question of how infants approach the problem of word segmentation has been the focus of intensive cross-disciplinary research in the last years, combining experimental studies on infants and adults, mostly on monolinguals, and computational modeling

Summary

Introduction

There are no spaces between words when we speak. There are no obvious and infallible cues that indicate word boundaries (e.g., Brent & Siskind, 2001). We know infants must have found a solution to this difficult problem because they know the meaning of some words by 6 months (Tincoff & Jusczyk, 1999, 2012), and they must have been able to learn at least those phonological sequences or word forms. Some evidence suggests that infants do not wait to learn true words (i.e., form-meaning pairings), but instead start segmenting their input and memorizing high frequency sequences as early as 6 months, to the point that they accumulate a proto-lexicon of about 500 word In the rest of this Introduction, we briefly introduce the problem of word segmentation and review other previous interdisciplinary research, before turning to our unique contributions

Objectives

Methods

Results

Discussion

Conclusion