Abstract
Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The “novel words to novel objects” language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task.
Highlights
Infants face many complex learning problems, one of the most challenging of which is learning a language
Training Parameters We investigated the word-learning performance in a virtual agent using models based on six reinforcement learning algorithms: Qlearning, SARSA, SARSA-λ, NFQ, and deep Q-network (DQN)
It is claimed that only memory-limited models can truly mimic human performance (Frank et al, 2010)
Summary
Infants face many complex learning problems, one of the most challenging of which is learning a language. Word-Learning words and their referents is fundamentally ambiguous as illustrated by Quine using the “Gavagai” problem (Quine et al, 2013) In this problem, the word “Gavagai” is uttered while pointing toward a rabbit in a field; corresponding referents can be the rabbit, the field or the color of the rabbit. The word “Gavagai” is uttered while pointing toward a rabbit in a field; corresponding referents can be the rabbit, the field or the color of the rabbit Solving this problem involves a number of challenging tasks: segmenting continuous speech into words, determining a set of objects/referents/concepts that are present in the immediate environment, and finding a way to correlate the heard words with the seen objects. There are multiple possibilities in both spaces, language and referent, and learning the mapping between them is a non-trivial problem
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have