Word learning entails the mapping of an auditory word-form to its appropriate grammatical category (e.g., noun, verb, adjective), but before that mapping can occur, the naïve learner must infer which of the myriad of possible referents of that word was intended by the speaker. This creates a computational explosion of referential ambiguity referred to as the gavagai problem. In a set of corpus analyses of parent-directed speech to young infants, we describe the distributional information available to early word learners, with a focus on nouns and adjectives that refer to whole objects and object properties. And in two experiments on word-learning in adults spanning seven different distributional conditions, we document how variations in the ratio of novel labels for objects and properties affect the robustness of word learning. Our results suggest that the language input to 6- to 20-month-olds is robustly populated with high-frequency object words and high-frequency property words, but their co-occurrence is sparse. Although this distributional information slightly favors object words over property words, a more plausible account of the whole-object bias in early word learning is the inability to encode the details of an object/event during rapid naming. Our results from adults, presented with novel labels for multi-referent objects in a cross-situational statistical learning paradigm, also reveal this whole-object bias as well as the absence of property-label generalization to novel objects, even when the distribution of labels is shifted almost exclusively to property words. These results are discussed in terms of the relative ease of mapping auditory word-forms to whole objects vs. object properties, thereby limiting the combinatorics of the gavagai problem, especially in infants with immature encoding and memory representation abilities.
Read full abstract