Abstract

In learning models of strategic game play, an agent constructs a valuation (action value) over possible future choices as a function of past actions and rewards. Choices are then stochastic functions of these action values. Our goal is to uncover a neural signal that correlates with the action value posited by behavioral learning models. We measured activity from neurons in the superior colliculus (SC), a midbrain region involved in planning saccadic eye movements, while monkeys performed two saccade tasks. In the strategic task, monkeys competed against a computer in a saccade version of the mixed-strategy game ”matching-pennies”. In the instructed task, saccades were elicited through explicit instruction rather than free choices. In both tasks neuronal activity and behavior were shaped by past actions and rewards with more recent events exerting a larger influence. Further, SC activity predicted upcoming choices during the strategic task and upcoming reaction times during the instructed task. Finally, we found that neuronal activity in both tasks correlated with an established learning model, the Experience Weighted Attraction model of action valuation (Camerer and Ho, 1999). Collectively, our results provide evidence that action values hypothesized by learning models are represented in the motor planning regions of the brain in a manner that could be used to select strategic actions.

Highlights

  • In reinforcement learning models, an individual’s choice is a probabilistic function of the current values of possible actions, which in turn are functions of past choices and past rewards (Sutton and Barto, 1998)

  • To address how value is encoded in neural signals, we introduce our measurement of superior colliculus (SCi),t, defined as the SCi activity associated with saccade target s in trial t of experiment i

  • Having established that choice is dependent on previous trials, and SCi activity predicts choice on a given trial, in Section “Encoding Experience Weighted Attraction (EWA) Action Value” we test our hypothesis that SCi neurons represent the action-specific valuations posited by EWA

Read more

Summary

Introduction

An individual’s choice is a probabilistic function of the current values of possible actions, which in turn are functions of past choices and past rewards (Sutton and Barto, 1998). These learning models are based on the concept of choice reinforcement, traced back to the Law of Effect (Thorndike, 1898; Erev and Roth, 1998).

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call