Computational Characteristics of the Striatal Dopamine System Described by Reinforcement Learning With Fast Generalization.

Yoshihisa Fujita,Sho Yagishita,Shin Ishii,Haruo Kasai

doi:10.3389/fncom.2020.00066

Abstract

Generalization is the ability to apply past experience to similar but non-identical situations. It not only affects stimulus-outcome relationships, as observed in conditioning experiments, but may also be essential for adaptive behaviors, which involve the interaction between individuals and their environment. Computational modeling could potentially clarify the effect of generalization on adaptive behaviors and how this effect emerges from the underlying computation. Recent neurobiological observation indicated that the striatal dopamine system achieves generalization and subsequent discrimination by updating the corticostriatal synaptic connections in differential response to reward and punishment. In this study, we analyzed how computational characteristics in this neurobiological system affects adaptive behaviors. We proposed a novel reinforcement learning model with multilayer neural networks in which the synaptic weights of only the last layer are updated according to the prediction error. We set fixed connections between the input and hidden layers to maintain the similarity of inputs in the hidden-layer representation. This network enabled fast generalization of reward and punishment learning, and thereby facilitated safe and efficient exploration of spatial navigation tasks. Notably, it demonstrated a quick reward approach and efficient punishment aversion in the early learning phase, compared to algorithms that do not show generalization. However, disturbance of the network that causes noisy generalization and impaired discrimination induced maladaptive valuation. These results suggested the advantage and potential drawback of computation by the striatal dopamine system with regard to adaptive behaviors.

Highlights

Animals’ survival incorporates reward-seeking behavior accompanied by risks
OVaRLAP achieves stimulus generalization based on the similarity of states (Figure 2A) because it utilizes a fixed neural network for the preprocessing of the value learning network
These results suggest that the OVaRLAP agent could learn a reward approach and pain aversion in a very efficient manner owing to its proper generalization, whereas it was inferior to the simple temporal difference (TD) in long-term reward learning and to MaxPain in long-term pain aversion

Summary

Introduction

Outcome observation resulting from the pairing of a current state and a taken action provide clues to ensure optimal behaviors, but it may be associated with substantial energy consumption and aversive experiences. Such a learning process is inefficient and even harmful, especially when animals are Reinforcement Learning With Fast Generalization required to adapt to new environments. Animals instead generalize their previous experiences to predict outcome, even in novel situations. Generalization and discrimination may be essential for efficient adaptive behaviors, whereas abnormalities in these functions can be maladaptive. A recent neurobiological study presented the possibility that disrupted discrimination is involved in psychotic symptoms (Iino et al, 2020)

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in computational neuroscience	Publication Date: Jul 22, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Computational Characteristics of the Striatal Dopamine System Described by Reinforcement Learning With Fast Generalization.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in computational neuroscience

Lead the way for us

Similar Papers

Chapter 23 - Opponent Brain Systems for Reward and Punishment Learning: Causal Evidence From Drug and Lesion Studies in Humans
S Palminteri ... M Pessiglione
Decision Neuroscience | VOL. -
S Palminteri, et. al.S Palminteri ... M Pessiglione
28 Oct 2016
Decision Neuroscience | VOL. -

Author response: Associability-modulated loss learning is increased in posttraumatic stress disorder
Vanessa M Brown ... B Christopher Frueh
-
Vanessa M Brown, et. al.Vanessa M Brown ... B Christopher Frueh
19 Oct 2017
19 Oct 2017

P242. Anxiety Associated with Perceived Lack of Control Over Stress During the COVID-19 Pandemic Impairs Reward Learning
Marc Guitart-Masip ... Andreas Olsson
Biological Psychiatry | VOL. 91
Marc Guitart-Masip, et. al.Marc Guitart-Masip ... Andreas Olsson
28 Apr 2022
Biological Psychiatry | VOL. 91

Deep Reinforcement Learning for Automatic Drilling Optimization Using an Integrated Reward Function
Trieu Phat Luu ... Xu Huang
-
Trieu Phat Luu, et. al.Trieu Phat Luu ... Xu Huang
27 Feb 2024
27 Feb 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computational Characteristics of the Striatal Dopamine System Described by Reinforcement Learning With Fast Generalization.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in computational neuroscience