CLASSIC Utterance Boundary: A Chunking‐Based Model of Early Naturalistic Word Segmentation

Francesco Cabiddu,Gary Jones,Chiara Gambi,Lewis Bott

doi:10.1111/lang.12559

Francesco Cabiddu, Gary Jones + Show 2 more

Open Access

PDF Available

https://doi.org/10.1111/lang.12559

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

AbstractWord segmentation is a crucial step in children's vocabulary learning. While computational models of word segmentation can capture infants’ performance in small‐scale artificial tasks, the examination of early word segmentation in naturalistic settings has been limited by the lack of measures that can relate models’ performance to developmental data. Here, we extended CLASSIC (Chunking Lexical and Sublexical Sequences in Children; Jones et al., 2021), a corpus‐trained chunking model that can simulate several memory and phonological and vocabulary learning phenomena to allow it to perform word segmentation using utterance boundary information, and we have named this extended version CLASSIC utterance boundary (CLASSIC‐UB). Further, we compared our model to the performance of children on a wide range of new measures, capitalizing on the link between word segmentation and vocabulary learning abilities. We showed that the combination of chunking and utterance‐boundary information used by CLASSIC utterance boundary allowed a better prediction of English‐learning children's output vocabulary than did other models.

Full Text