Abstract
Avoidance behaviors, in which a learned response causes omission of an upcoming punisher, are a core feature of many psychiatric disorders. While reinforcement learning (RL) models have been widely used to study the development of appetitive behaviors, less attention has been paid to avoidance. Here, we present a RL model of lever-press avoidance learning in Sprague-Dawley (SD) rats and in the inbred Wistar Kyoto (WKY) rat, which has been proposed as a model of anxiety vulnerability. We focus on “warm-up,” transiently decreased avoidance responding at the start of a testing session, which is shown by SD but not WKY rats. We first show that a RL model can correctly simulate key aspects of acquisition, extinction, and warm-up in SD rats; we then show that WKY behavior can be simulated by altering three model parameters, which respectively govern the tendency to explore new behaviors vs. exploit previously reinforced ones, the tendency to repeat previous behaviors regardless of reinforcement, and the learning rate for predicting future outcomes. This suggests that several, dissociable mechanisms may contribute independently to strain differences in behavior. The model predicts that, if the “standard” inter-session interval is shortened from 48 to 24 h, SD rats (but not WKY) will continue to show warm-up; we confirm this prediction in an empirical study with SD and WKY rats. The model further predicts that SD rats will continue to show warm-up with inter-session intervals as short as a few minutes, while WKY rats will not show warm-up, even with inter-session intervals as long as a month. Together, the modeling and empirical data indicate that strain differences in warm-up are qualitative rather than just the result of differential sensitivity to task variables. Understanding the mechanisms that govern expression of warm-up behavior in avoidance may lead to better understanding of pathological avoidance, and potential pathways to modify these processes.
Highlights
Reviewed by: Seth Davin Norrholm, Emory University School of Medicine, USA Christopher Cain, Nathan S
We first show that a reinforcement learning (RL) model can correctly simulate key aspects of acquisition, extinction, and warm-up in SD rats; we show that Wistar Kyoto (WKY) behavior can be simulated by altering three model parameters, which respectively govern the tendency to explore new behaviors vs. exploit previously reinforced ones, the tendency to repeat previous behaviors regardless of reinforcement, and the learning rate for predicting future outcomes
WKY rats have reduced mesolimbic dopamine function (Jiao et al, 2003), a system which has been implicated in generating the prediction error signal in RL (Hollerman and Schultz, 1998; Schultz and Dickinson, 2000), we reduced the learning rate α at which the critic updates weights based on prediction error
Summary
Reviewed by: Seth Davin Norrholm, Emory University School of Medicine, USA Christopher Cain, Nathan S. We first show that a RL model can correctly simulate key aspects of acquisition, extinction, and warm-up in SD rats; we show that WKY behavior can be simulated by altering three model parameters, which respectively govern the tendency to explore new behaviors vs exploit previously reinforced ones, the tendency to repeat previous behaviors regardless of reinforcement, and the learning rate for predicting future outcomes. This suggests that several, dissociable mechanisms may contribute independently to strain differences in behavior. In lever-press avoidance, a rat is placed in a conditioning chamber for several acquisition trials; on each trial, a warning signal W, such as a tone, is presented for some interval (warning period), and remains on during a subsequent shock
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have