Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks.

Chentao Wen,Toshiya Matsushima,Yukiko Ogura

doi:10.3389/fnins.2016.00476

Chentao Wen, Toshiya Matsushima + Show 1 more

Open Access

https://doi.org/10.3389/fnins.2016.00476

Copy DOI

Abstract

To ensure survival, animals must update the internal representations of their environment in a trial-and-error fashion. Psychological studies of associative learning and neurophysiological analyses of dopaminergic neurons have suggested that this updating process involves the temporal-difference (TD) method in the basal ganglia network. However, the way in which the component variables of the TD method are implemented at the neuronal level is unclear. To investigate the underlying neural mechanisms, we trained domestic chicks to associate color cues with food rewards. We recorded neuronal activities from the medial striatum or tegmentum in a freely behaving condition and examined how reward omission changed neuronal firing. To compare neuronal activities with the signals assumed in the TD method, we simulated the behavioral task in the form of a finite sequence composed of discrete steps of time. The three signals assumed in the simulated task were the prediction signal, the target signal for updating, and the TD-error signal. In both the medial striatum and tegmentum, the majority of recorded neurons were categorized into three types according to their fitness for three models, though these neurons tended to form a continuum spectrum without distinct differences in the firing rate. Specifically, two types of striatal neurons successfully mimicked the target signal and the prediction signal. A linear summation of these two types of striatum neurons was a good fit for the activity of one type of tegmental neurons mimicking the TD-error signal. The present study thus demonstrates that the striatum and tegmentum can convey the signals critically required for the TD method. Based on the theoretical and neurophysiological studies, together with tract-tracing data, we propose a novel model to explain how the convergence of signals represented in the striatum could lead to the computation of TD error in tegmental dopaminergic neurons.

Highlights

To cope with the ever-changing environment, adaptive agents generate an internal representation of the value associated with their present state
We focused on the neuronal activities that occurred during the reward period when a predicted food was omitted
A trial is a finite sequence composed of states S0, S1, S2, S3, S4 and Sterminal, corresponding to a pre-trial period (t = 0), cue period (1), peck-operant period (2), delay period (3), and reward period (4), respectively, followed by the terminal

Summary

INTRODUCTION

To cope with the ever-changing environment, adaptive agents generate an internal representation of the value associated with their present state. Some aspects of state value are supposed to be quickly updated In another task in which chicks actively forage between two feeders placed at opposite ends of an I-shaped maze (Ogura and Matsushima, 2011; Ogura et al, 2015), chicks quickly changed their stay time when the profitability of the feeders changed (Xin et al, under review). We examined the assumed connections from the MSt descending to the tegmentum via tract-tracing combined with immunostaining for thyroxine hydroxylase (TH, a marker of DA-ergic neurons) Based on these results, we propose a novel hypothetical process in which TD learning for foraging behavior is accomplished via interactions between the MSt and the midbrain DA system

MATERIALS AND METHODS

RESULTS

DISCUSSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in neuroscience	Publication Date: Nov 8, 2016
Citations: 11	License type: cc-by

R Discovery Prime

R Discovery Prime

Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in neuroscience

Lead the way for us

Similar Papers

Empirical Studies in Action Selection with Reinforcement Learning
Shimon Whiteson ... Matthew E Taylor
Adaptive Behavior | VOL. 15
Shimon Whiteson, et. al.Shimon Whiteson ... Matthew E Taylor
01 Mar 2007
Adaptive Behavior | VOL. 15

Temporal difference method for multi-step prediction: application to power load forecasting
Jenq-Neng Hwang ... Seokyong Moon
-
Jenq-Neng Hwang, et. al. Jenq-Neng Hwang ... Seokyong Moon
23 Jul 1991
23 Jul 1991

MTD method for better prediction of sea surface temperature
V Ganapathy ... M A Kashem
International Journal of Remote Sensing | VOL. 23
V Ganapathy, et. al.V Ganapathy ... M A Kashem
01 Jan 2002
International Journal of Remote Sensing | VOL. 23

Evidence that the delay-period activity of dopamine neurons corresponds to reward uncertainty rather than backpropagating TD errors
Christopher D Fiorillo ... Philippe N Tobler
Behavioral and brain functions : BBF | VOL. 1
Christopher D Fiorillo, et. al.Christopher D Fiorillo ... Philippe N Tobler
15 Jun 2005
Behavioral and brain functions : BBF | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in neuroscience