Analogical inference from distributional structure: What recurrent neural networks can tell us about word learning

Philip A Huebner,Jon A Willits

doi:10.1016/j.mlwa.2023.100478

Abstract

One proposal that can explain the remarkable pace of word learning in young children is that they leverage the language-internal distributional similarity of familiar and novel words to make analogical inferences about possible meanings of novel words (Lany and Gómez, 2008; Lany and Saffran, 2011; Savic et al., 2022b; Unger and Fisher, 2021; Wojcik and Saffran, 2015). However, a cognitively and developmentally plausible computational account of how language-internal lexical representations are acquired to enable this kind of analogical inference has not been previously investigated. In this work, we tested the feasibility of using the SRN (Elman, 1990) as the supplier of language-internal representations for use in analogical inference. While the SRN is in many ways well suited to this task, we discuss several theoretical challenges that might limit its success. In a series of simulations with controlled artificial languages and the CHILDES corpus, we show that Recurrent Neural Networks (RNNs) are prone to acquiring ‘entangled’ lexical semantic representations, where some features of a word are partially encoded in the representations of other frequently co-occurring words. However, we also show that this problem is mitigated when RNNs are first trained on language input to young children, due to the fact that its distributional structure more reliably predicts semantic category membership of individual words. Overall, our work sheds light on the conditions under which RNNs organize their learned knowledge so that word-level information can be more easily extracted and used in downstream processes, such as word learning.

Full Text