TD learning versus motivational salience accounts of dopamine in animal models of OCD

Becker Sue

doi:10.3389/conf.neuro.06.2009.03.032

Abstract

Event Abstract Back to Event TD learning versus motivational salience accounts of dopamine in animal models of OCD Disruption of the dopaminergic (DA) system is linked to a wide range of neurological disorders including attention deficit disorder, obsessive compulsive disorder (OCD), schizophrenia and Parkinson?s disease. An influential contribution to understanding this system has been from reinforcement learning theory, using the temporal-diference (TD) learning algorithm to model the correlation between dopamine cell firing rates in the dorsal striatum and reward prediction errors in associative learning. The TD model, in its simplest form, views the DA signal as conveying the difference between an expected reward and an actual reward. Another influential theory is that DA signals surprise and motivational salience and predicts an increase in firing rates in response to improbable events [1]. To further differentiate between these two formulations of the DA signal we looked at an animal model for OCD. Rats injected with Quinpirole, a dopamine D2/D3 receptor agonist, exhibit compulsive checking behaviour [2]. The injection results in a hyperactivation of the DA signal. We modeled this compulsive checking behaviour using both the TD and the surprise/salience model of the DA signal. In the TD model the compulsive checking behaviour is reinforced because the reward prediction error signal is misreporting the behaviour as increasingly rewarding. In the salience/surprise model the compulsive checking behaviour is induced when the animal perceives something improbable happening during normal activity, and is continued until no unexpected consequence is experienced. Under Quinpirole, the DA signal is constantly reporting that improbable events are occurring so the checking behaviour is never turned off. We found that both models showed similar compulsive checking with hyperactivation of the DA signal, but differed significantly after the hyperactivation was turned off. When the DA signal was not hyperactivated the TD model learned gradually over time that the checking behaviour was no longer rewarding. The checking behaviour was performed frequently at first but over time it significantly decreased. In the salience/surprise model the checking behaviour ceased as soon as the hyperactivated DA signal was turned off, since performance of the checking behaviour was no longer eliciting a surprise signal. We then compared the results of the two models to a preliminary analysis of the behaviour of rats, that had received previous injections of Quinpirole, during a trial where no drug was delivered. We found that the frequency of checking behaviour appeared to drop off immediately, and did not show a significant decrease over time. This suggests that the surprise/salience model of DA signal more accurately predicted animal behaviour than the TD model. This evidence that the DA signal might be best modeled as conveying surprise and motivational salience has important implications for our understanding of learning, action choice and how disruptions in the dopaminergic system contribute to neurological disorders.

Full Text