Abstract

Reinforcement learning (RL) is a theoretical framework that describes how agents learn to select options that maximize rewards and minimize punishments over time. We often make choices, however, to obtain symbolic reinforcers (e.g. money, points) that are later exchanged for primary reinforcers (e.g. food, drink). Although symbolic reinforcers are ubiquitous in our daily lives, widely used in laboratory tasks because they can be motivating, mechanisms by which they become motivating are less understood. In the present study, we examined how monkeys learn to make choices that maximize fluid rewards through reinforcement with tokens. The question addressed here is how the value of a state, which is a function of multiple task features (e.g. current number of accumulated tokens, choice options, task epoch, trials since last delivery of primary reinforcer, etc.), drives value and affects motivation. We constructed a Markov decision process model that computes the value of task states given task features to then correlate with the motivational state of the animal. Fixation times, choice reaction times, and abort frequency were all significantly related to values of task states during the tokens task (n=5 monkeys, three male and two female). Furthermore, the model makes predictions for how neural responses could change on a moment-by-moment basis relative to changes in state value. Together, this task and model allow us to capture learning and behavior related to symbolic reinforcement.Significance statement Symbolic reinforcers, like money and points, play a critical role in our lives. However, we lack a mechanistic understanding of how symbolic reinforcement can be related to fluctuations in motivation. We investigated the effect of symbolic reinforcers on behaviors related to motivation during a token reinforcement learning task, using a novel reinforcement learning model and data from five monkeys. We designed a state-based model that can capture reward-predicting features and produce state values at sub-trial resolution. Our findings suggest that the value of a task state can affect willingness to initiate a trial, speed to choose, and persistence to complete a trial. Our model makes testable predictions for within trial fluctuations of neural activity related to values of task states.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call