Temporal Credit Assignment Research Articles

Neuronal systems that are involved in reinforcement learning must solve the temporal credit assignment problem, i.e., how is a stimulus associated with a reward that is delayed in time? Theoretical studies [1-3] have postulated that neural activity underlying learning ‘tags’ synapses with an ‘eligibility trace’, and that the subsequent arrival of a reward converts the eligibility traces into actual modification of synaptic efficacies. While eligibility traces provide one simple solution to the temporal credit assignment problem, they alone do not constitute a stable learning rule because there is no other mechanism indicating when learning should cease. In order to attain stability, rules involving eligibility traces often assume that once the association is learned, further learning is prevented via an inhibition of the reward stimulus [1,3,4]. Although synaptic plasticity is responsible for reinforcement learning in the brain, theories of reinforcement learning are generally abstract and involve neither neurons nor synapses. Furthermore, biophysical theories of synaptic plasticity typically model unsupervised learning and ignore the contribution of reinforcement. Here we describe a biophysically based theory of reinforcementmodulated synaptic plasticity and postulate the existence of two eligibility traces with different temporal profiles: one corresponding to the induction of LTP, and the other to the induction of LTD. The traces have different kinetics and their difference in magnitude at the time of reward determines if synaptic modification will correspond to LTP or LTD. Due to the difference in their decay rates, the LTP and LTD traces can exhibit temporal competition at the reward time and thus provides a mechanism for stable reinforcement learning without the need to inhibit reward. We test this novel reinforcement-learning rule on an experimentally motivated model of a recurrent cortical network [5], and compare the model results to experimental results at both the cellular and circuit levels. We further suggest that these eligibility traces are implemented via kinases and phosphatases, thus accounting for results at both the cellular and system levels.

Almost all animal behaviors can be seen as sequences of actions towards achieving certain goals. How the association cortices learn to link sensory stimuli to a correct sequence of motor responses is not well understood, especially when only a correct sequence of responses is rewarding. We present a biologically plausible neuronal network model that can be trained to perform a large variety of tasks when only stimuli and reward contingencies are varied. The model’s aim is to learn action values in a feedforward neuronal network and we present mechanisms to overcome the structural and temporal credit assignment problems. The temporal credit assignment problem is solved by a form of Q-learning [1]. The structural credit assignment problem is solved by a form of ‘attentional’ feedback from motor cortex to association cortex that delineates the units that should change connectivity to improve behavior [2]. Moreover, the model has a new mechanism to store traces of relevant sensory stimuli in working memory. During learning, the sensory stimuli, in combination with traces of previous stimuli in working memory become associated with a unique set of action values. Learning in the model is biologically realistic as model units have Hebbian plasticity that is gated by two factors [2]. Firstly, reinforcers or increases in reward expectancy cause the global release of neuromodulatory signals that inform all synapses of the network if the outcome of a trial was better or worse than expected [3]. Selective attention is the second factor that gates plasticity. Attentional feedback highlights the chain of neurons between sensory and motor cortex responsible for the selected action. Only neurons that are causally linked to the action receive attentional feedback, and change the strength of their connections. Selective attention thereby solves the structural credit assignment problem. The resulting learning rule is a form of AGREL [2], which was previously shown to be on average equivalent to error-backpropagation in classification tasks with immediate reward. The present generalization of the learning scheme is based on temporal difference learning and it can train multilayer feedforward networks to perform delayed reward tasks with multiple epochs that require multiple behavioral responses. Importantly, the generalization MQ-AGREL learns to store in working memory information that is relevant at a later stage during a task. This memory is maintained by persistent activity of units at the intermediate network layers. We show that MQ-AGREL can be trained in many tasks that are in use in neurophysiology, including (1) (delayed) saccade-antisaccade tasks; (2) categorization tasks; and (3) probabilistic classification tasks. Neurons at intermediate levels of the network acquire visual responses and memory responses as the result of training that resemble the tuning of neurons in association areas of the cerebral cortex of animals that are trained in these same tasks. We conclude that MQ-AGREL is a powerful and biologically realistic learning rule that accounts for learning in delayed reward tasks that involve non-linear mappings from sensory stimuli and working memory onto motor responses.

Temporal Credit Assignment Research Articles

Related Topics

Articles published on Temporal Credit Assignment

Stable reinforcement learning via temporal competition between LTP and LTD traces

Neural Correlates of Temporal Credit Assignment in the Parietal Lobe

Navigating complex decision spaces: Problems and paradigms in sequential choice.

A Grey Synthesis Approach to Efficient Architecture Design for Temporal Difference Learning

How attention and reinforcers jointly optimize the associations between sensory representations, working memory and motor programs

Spatio-Temporal Credit Assignment in Neuronal Population Learning

Statistical mechanics of structural and temporal credit assignment effects on learning in neural networks

Learning from delayed feedback: neural responses in temporal credit assignment

Prefrontal neurons solve the temporal credit assignment problem during reinforcement learning

Dynamical model of salience gated working memory, action selection and reinforcement based on basal ganglia and dopamine feedback

A network model that can learn reward timing using reinforced expression of synaptic plasticity

Using temporal-difference learning for multi-agent bargaining

Spatial temporal credit assignment neural network model

Models of birdsong learning

Input-output HMMs for sequence processing

Generation of temporal sequences using local dynamic programming

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

A Hierarchical Network of Control Systems that Learn: Modeling Nervous System Function During Classical and Instrumental Conditioning

LEARNING TO GENERATE ARTIFICIAL FOVEA TRAJECTORIES FOR TARGET DETECTION

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Temporal Credit Assignment Research Articles

Related Topics

Articles published on Temporal Credit Assignment

Stable reinforcement learning via temporal competition between LTP and LTD traces

Neural Correlates of Temporal Credit Assignment in the Parietal Lobe

Navigating complex decision spaces: Problems and paradigms in sequential choice.

A Grey Synthesis Approach to Efficient Architecture Design for Temporal Difference Learning

How attention and reinforcers jointly optimize the associations between sensory representations, working memory and motor programs

Spatio-Temporal Credit Assignment in Neuronal Population Learning

Statistical mechanics of structural and temporal credit assignment effects on learning in neural networks

Learning from delayed feedback: neural responses in temporal credit assignment

Prefrontal neurons solve the temporal credit assignment problem during reinforcement learning

Dynamical model of salience gated working memory, action selection and reinforcement based on basal ganglia and dopamine feedback

A network model that can learn reward timing using reinforced expression of synaptic plasticity

Using temporal-difference learning for multi-agent bargaining

Spatial temporal credit assignment neural network model

Models of birdsong learning

Input-output HMMs for sequence processing

Generation of temporal sequences using local dynamic programming

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

A Hierarchical Network of Control Systems that Learn: Modeling Nervous System Function During Classical and Instrumental Conditioning

LEARNING TO GENERATE ARTIFICIAL FOVEA TRAJECTORIES FOR TARGET DETECTION