Abstract
Dopaminergic neurons in the mammalian substantia nigra display characteristic phasic responses to stimuli which reliably predict the receipt of primary rewards. These responses have been suggested to encode reward prediction-errors similar to those used in reinforcement learning. Here, we propose a model of dopaminergic activity in which prediction-error signals are generated by the joint action of short-latency excitation and long-latency inhibition, in a network undergoing dopaminergic neuromodulation of both spike-timing dependent synaptic plasticity and neuronal excitability. In contrast to previous models, sensitivity to recent events is maintained by the selective modification of specific striatal synapses, efferent to cortical neurons exhibiting stimulus-specific, temporally extended activity patterns. Our model shows, in the presence of significant background activity, (i) a shift in dopaminergic response from reward to reward-predicting stimuli, (ii) preservation of a response to unexpected rewards, and (iii) a precisely timed below-baseline dip in activity observed when expected rewards are omitted.
Highlights
The mammalian dopamine (DA) system is implicated in a wide range of cognitive functions
To advance the “bottomup” approach, we describe and analyze a model of DA activity in which phasic prediction-error signals are generated through the joint action of excitatory and inhibitory pathways, in a spiking neural network undergoing DA modulation of both spike-timing dependent synaptic plasticity (DA–STDP) and neuronal excitability (DA-modulated post-synaptic facilitation, DA–PSF)
Each stimulus is presented to the network as a distinct pattern of current applied to 50% of the neurons in each of sensory neurons (SEN) and prefrontal cortex (PFC) (Figure 2)
Summary
The mammalian dopamine (DA) system is implicated in a wide range of cognitive functions. Most computational approaches to modeling DA responses during learning have focused on the “temporal difference” algorithm (Sutton and Barto, 1998; Pan et al, 2005, 2008; Hazy et al, 2010) which computes expected reward using an explicit temporal discount (Sutton and Barto, 1998). “dual-path” models (Brown et al, 1999; Tan and Bullock, 2008) investigate interactions between complementary excitatory and inhibitory pathways converging on DA neurons These models involve spiking neural networks but do not rely on the precisely timed spiking activity patterns observed in prefrontal cortex (PFC) and striatum during reinforcement learning (Schultz, 1992; Durstewitz et al, 2000). To advance the “bottomup” approach, we describe and analyze a model of DA activity in which phasic prediction-error signals are generated through the joint action of excitatory and inhibitory pathways, in a spiking neural network undergoing DA modulation of both spike-timing dependent synaptic plasticity (DA–STDP) and neuronal excitability (DA-modulated post-synaptic facilitation, DA–PSF)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.