Temporal Difference Reinforcement Learning Research Articles

Dopamine release in the nucleus accumbens core (NAcC) is generally considered to be a proxy for phasic firing of dopamine neurons in the ventral tegmental area (VTADA). Thus, dopamine release in NAcC is hypothesized to reflect a unitary role in reward prediction error signalling. However, recent studies revealed more diverse roles of dopamine neurons, which support an emerging idea that dopamine regulates learning differently in distinct circuits. To understand whether the NAcC might regulate a unique component of learning, we recorded dopamine release in NAcC while male rats performed a backward conditioning task where a reward is followed by a cue. We used this task because we can delineate different components of learning, which include sensory-specific inhibitory and general excitatory components. Further, we have shown that VTADA neurons are necessary for both the specific and general components of backward associations. Here, we found that dopamine release in NAcC increased to the reward across learning, while reducing to the cue that followed as it became more expected. This mirrors the dopamine prediction error signal seen during forward conditioning and cannot be accounted for temporal-difference reinforcement learning (TDRL). Subsequent tests allowed us to dissociate these learning components and revealed that dopamine release in NAcC reflects the general excitatory component of backward associations, but not their sensory-specific component. These results emphasize the importance of examining distinct functions of different dopamine projections in reinforcement learning.Significance Statement Dopamine regulates reinforcement learning. While it was previously believed that this system contributed to simple value assignment to reward cues, we now know dopamine plays increasingly diverse roles in reinforcement learning. How these diverse roles are achieved in distinct circuits is not fully understood. By using behavioural tasks that examine distinctive components of learning separately, we reveal that NAcC dopamine release contributes to a unique component of learning. Thus, the present study supports a distinct role of NAcC in reinforcement learning, consistent with the idea that different dopamine systems serve different learning functions. Examining the roles of different dopamine projections is an important approach to identify neuronal mechanisms underlying the reinforcement-learning deficits observed in schizophrenia and drug addiction.

In the DSM-5, psychiatric diagnoses are made based on self-reported symptoms and clinician-identified signs. Though helpful in choosing potential interventions based on the available regimens, this conceptualization of psychiatric diseases can limit basic science investigation into their underlying causes. The reward prediction error (RPE) hypothesis of dopamine neuron function posits that phasic dopamine signals encode the difference between the rewards a person expects and experiences. The computational framework from which this hypothesis was derived, temporal difference reinforcement learning (TDRL), is largely focused on reward processing rather than punishment learning. Many psychiatric disorders are characterized by aberrant behaviors, expectations, reward processing, and hypothesized dopaminergic signaling, but also characterized by suffering and the inability to change one's behavior despite negative consequences. In this review, we provide an overview of the RPE theory of phasic dopamine neuron activity and review the gains that have been made through the use of computational reinforcement learning theory as a framework for understanding changes in reward processing. The relative dearth of explicit accounts of punishment learning in computational reinforcement learning theory and its application in neuroscience is highlighted as a significant gap in current computational psychiatric research. Four disorders comprise the main focus of this review: two disorders of traditionally hypothesized hyperdopaminergic function, addiction and schizophrenia, followed by two disorders of traditionally hypothesized hypodopaminergic function, depression and post-traumatic stress disorder (PTSD). Insights gained from a reward processing based reinforcement learning framework about underlying dopaminergic mechanisms and the role of punishment learning (when available) are explored in each disorder. Concluding remarks focus on the future directions required to characterize neuropsychiatric disorders with a hypothesized cause of underlying dopaminergic transmission.

Temporal Difference Reinforcement Learning Research Articles

Related Topics

Articles published on Temporal Difference Reinforcement Learning

Dopamine Release in the Nucleus Accumbens Core Encodes the General Excitatory Components of Learning.

A novel method-based reinforcement learning with deep temporal difference network for flexible double shop scheduling problem

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model.

Intrinsic fluctuations of reinforcement learning promote cooperation

SemiACO: A semi-supervised feature selection based on ant colony optimization

Computational reinforcement learning, reward (and punishment), and dopamine in psychiatric disorders.

Safe couplings: coupled refinement types

Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning.

Task Learnability Modulates Surprise but Not Valence Processing for Reinforcement Learning in Probabilistic Choice Tasks.

Toward Robots’ Behavioral Transparency of Temporal Difference Reinforcement Learning With a Human Teacher

Promoting the Emergence of Behavior Norms in a Principal–Agent Problem—An Agent-Based Modeling Approach Using Reinforcement Learning

Dynamical systems as a level of cognitive analysis of multi-agent learning

Ant-TD: Ant colony optimization plus temporal difference reinforcement learning for multi-label feature selection

Speeding-Up Action Learning in a Social Robot With Dyna-Q+: A Bioinspired Probabilistic Model Approach

Correlation minimizing replay memory in temporal-difference reinforcement learning

Deterministic limit of temporal difference reinforcement learning for stochastic games.

Reinforcement learning in artificial and biological systems

Automated Enemy Avoidance of Unmanned Aerial Vehicles Based on Reinforcement Learning

Towards learning agents with personality traits: Modeling Openness to Experience

An Agent-Based Approach to Interbank Market Lending Decisions and Risk Implications

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Temporal Difference Reinforcement Learning Research Articles

Related Topics

Articles published on Temporal Difference Reinforcement Learning

Dopamine Release in the Nucleus Accumbens Core Encodes the General Excitatory Components of Learning.

A novel method-based reinforcement learning with deep temporal difference network for flexible double shop scheduling problem

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model.

Intrinsic fluctuations of reinforcement learning promote cooperation

SemiACO: A semi-supervised feature selection based on ant colony optimization

Computational reinforcement learning, reward (and punishment), and dopamine in psychiatric disorders.

Safe couplings: coupled refinement types

Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning.

Task Learnability Modulates Surprise but Not Valence Processing for Reinforcement Learning in Probabilistic Choice Tasks.

Toward Robots’ Behavioral Transparency of Temporal Difference Reinforcement Learning With a Human Teacher

Promoting the Emergence of Behavior Norms in a Principal–Agent Problem—An Agent-Based Modeling Approach Using Reinforcement Learning

Dynamical systems as a level of cognitive analysis of multi-agent learning

Ant-TD: Ant colony optimization plus temporal difference reinforcement learning for multi-label feature selection

Speeding-Up Action Learning in a Social Robot With Dyna-Q+: A Bioinspired Probabilistic Model Approach

Correlation minimizing replay memory in temporal-difference reinforcement learning

Deterministic limit of temporal difference reinforcement learning for stochastic games.

Reinforcement learning in artificial and biological systems

Automated Enemy Avoidance of Unmanned Aerial Vehicles Based on Reinforcement Learning

Towards learning agents with personality traits: Modeling Openness to Experience

An Agent-Based Approach to Interbank Market Lending Decisions and Risk Implications