Abstract

Full text Figures and data Side by side Abstract Editor's evaluation Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract The subthalamic nucleus (STN) is hypothesized to play a central role in neural processes that regulate self-control. Still uncertain, however, is how that brain structure participates in the dynamically evolving estimation of value that underlies the ability to delay gratification and wait patiently for a gain. To address that gap in knowledge, we studied the spiking activity of neurons in the STN of monkeys during a task in which animals were required to remain motionless for varying periods of time in order to obtain food reward. At the single-neuron and population levels, we found a cost–benefit integration between the desirability of the expected reward and the imposed delay to reward delivery, with STN signals that dynamically combined both attributes of the reward to form a single integrated estimate of value. This neural encoding of subjective value evolved dynamically across the waiting period that intervened after instruction cue. Moreover, this encoding was distributed inhomogeneously along the antero-posterior axis of the STN such that the most dorso-posterior-placed neurons represented the temporal discounted value most strongly. These findings highlight the selective involvement of the dorso-posterior STN in the representation of temporally discounted rewards. The combination of rewards and time delays into an integrated representation is essential for self-control, the promotion of goal pursuit, and the willingness to bear the costs of time delays. Editor's evaluation This study provides valuable information regarding the neurophysiological basis of self-control. The authors recorded the single neuron activity in the subthalamic nucleus in Monkeys. The authors found neurons whose activity was modulated by reward magnitudes and delays. https://doi.org/10.7554/eLife.83971.sa0 Decision letter Reviews on Sciety eLife's review process Introduction Imagine you are standing in a queue in front of a bakery. How long are you willing to wait for your favorite pastry? Many of us lose patience after about 5 min, while others persevere and keep waiting during longer delays. Our individual ability to delay gratification and maintain self-control depends on an internal process that estimates continuously the trade-off between the desirability of the benefit expected and the cost of waiting (Ainslie, 1975). All animals, including humans, prefer to receive rewards sooner rather than later, a phenomenon known as temporal discounting (Frederick et al., 2002; Loewenstein and Prelec, 1993; Mazur, 2001; Vanderveldt et al., 2016). Accordingly, people with low discount rates tend to pursue their long-term goals patiently, whereas people with high discount rates often abandon their goals impulsively and move on (Janakiraman et al., 2011). In economic behavior, the net payoff for such a cost–benefit dilemma is typically evaluated by integrating the magnitude of the future reward with a hyperbolic discounting function (Green and Myerson, 2004; Kirby, 1997; Loewenstein et al., 1992). Most studies of the neural correlates of temporal discounting have focused on the task instruction period, the point in a trial when the subject is informed of the size of the reward to be delivered and of the delay in time until its delivery (Berns et al., 2007). Far less is known about neuronal activity during the subsequent post-instruction delay period, during which subjects may exhibit varying degrees of patience (e.g., self-control) and anticipation of reward. It is quite possible that neuronal activity related to those factors also evolves dynamically across this time period. For example, as the subjective value of a future reward is updated with the passage of time, the motivation to achieve a delayed goal may vary gradually. Indeed, functional imaging studies in humans suggest that neural activity related to temporal discounting evolves dynamically during a post-instruction delay period in patterns that differ distinctly between brain regions (Jimura et al., 2013; McGuire and Kable, 2015; Tanaka et al., 2020). How such dynamically evolving encodings of temporally discounted subjective value are instantiated at the single-unit level remains poorly understood. Because the subthalamic nucleus (STN) is thought to be crucial in inhibitory control by preventing impulsivity (Aron et al., 2016; Bonnevie and Zaghloul, 2019; Jahanshahi et al., 2015) and modulating the performance of reward-seeking actions (Baunez et al., 2007; Baunez and Robbins, 1997), we hypothesized that this structure could contribute to the maintenance of adaptive behaviors by dynamically computing the temporally discounted value. The STN occupies a unique position for translating motivational drives into behavioral perseverance, standing at the crossroads between the basal ganglia indirect pathway and many hyperdirect inputs from prefrontal areas involved in motivational, cognitive and motor functions (Haynes and Haber, 2013; Parent and Hazrati, 1995). Current functional models of the STN propose that increased activity in the STN extends the time to action initiation by elevating decision thresholds, preventing suboptimal early responses or decisions, especially in situations in which the motivational options are conflicting (Cavanagh et al., 2011; Frank, 2006; Mansfield et al., 2011). In support of these models, a series of lesion studies performed on rats provided causal evidence that STN restrains premature responding in instrumental tasks (Baunez and Robbins, 1997; Wiener et al., 2008) and controls the willingness to work for food (Baunez et al., 2005; Baunez et al., 2002). Dysfunctions of STN circuits even produced perseverative actions with a reduced ability to switch between behaviors (Baker and Ragozzino, 2014; Baunez et al., 2007), making this brain region a good candidate for regulating self-control and delayed gratification. Until now, however, existing evidence is mixed on whether the STN is involved in temporal discounting (Aiello et al., 2019; Evens et al., 2015; Seinstra et al., 2016; Seymour et al., 2016; Uslaner and Robinson, 2006; Voon et al., 2017; Winstanley et al., 2005), and no previous study has investigated how STN neurons process value information across delays. Aside from its role in motor control, clinical studies support the involvement of the STN in motivational functions. In particular, deep brain stimulation (DBS-STN), which is effective at alleviating motor symptoms in parkinsonian patients, may induce a variety of side effects related to altered motivation such as depression, excessive eating behavior, and hypomania (Berney et al., 2002; Castrioto et al., 2014; Jahanshahi et al., 2015; Voon et al., 2006). Electrophysiological recordings collected from the STN of these patients have shown low-frequency oscillations (<12 Hz) and spiking activities related to various aspects of reward processing, with neural signals modulated by the magnitude of monetary reward and cost–benefit value attribution (Fumagalli et al., 2015; Justin Rossi et al., 2017; Zénon et al., 2016). In non-human animals, the ability of the STN to represent the subjective desirability of actions has also been evidenced by studies that show neurons firing as a function of the expected reward and the associated effort cost (Breysse et al., 2015; Espinosa-Parrilla et al., 2013; Nougaret et al., 2022). Although substantial effort has been directed to elucidate the role of the STN in valuation-related processes at the time of decision-making (Cavanagh et al., 2011; Coulthard et al., 2012; Frank et al., 2007) and in movement incentive (Nougaret et al., 2022; Tan et al., 2015; Zénon et al., 2016), much less attention has been paid to STN involvement in the computation of temporally discounted value during a waiting period, when behavioral inhibition must be sustained patiently over time. In addition, it is still unclear how these roles for STN in cost–benefit valuation and motivational processing relate to the known organization of this nucleus into anatomically and functionally distinct territories (Alexander et al., 1990; Nambu et al., 2002; Parent and Hazrati, 1995). To determine whether the STN conveys signals consistent with its predicted role in pursuing delayed gratification, we trained two monkeys to perform a delayed reward task in which animals were required to remain motionless during post-instruction delay periods of varying durations in order to obtain food reward. We hypothesized that STN neurons exhibit a dynamic encoding of temporally discounted value over the time course of the delay period consistent with a continuously evolving value attribution essential for self-control. Here we tested this hypothesis by studying spiking activity in the STN while monkeys performed the task. At the single-neuron and population levels, our results support a role for the STN in temporal discounting and indicate that neural signals underlying the valuation of reward size and delay are integrated dynamically into subjective value along an antero-posterior axis in this nucleus. Such dynamic value integration through the STN may regulate the expression of persistent behaviors for which a continuously evolving cost–benefit estimation is required to monitor and sustain goal achievement. Results Monkeys’ behavior reflects reward size and delay in an integrated manner Two monkeys (H and C) were trained to perform a delayed reward task in which they were required to align a cursor on a visual target and to maintain this arm posture for varying periods of time before delivery of food rewards (Figure 1A). At the beginning of each trial, an instruction cue appeared transiently signaling one of six possible reward contingencies. Cue colors indicated the size of reward (one, two, or three drops of food) and symbols indicated the delay-to-reward (short delay [3.5–5.6 s] or long delay [5.2–7.3 s]). Animals were given the option to reject a proposed trial by moving the cursor outside of the target (e.g., if they did not think it was worth waiting for the expected quantity of reward). In this task, the rejection rate (i.e., the proportion of trials with a failure to keep the cursor in the target) reflects the monkey’s motivation to stay engaged in the task and to successfully complete the trial according to its prediction about the forthcoming reward. The six instruction cues effectively communicated six different levels of motivation or subjective value as evidenced by consistent effects on the animals’ task performance (Figure 1B and C). Rejection rates were affected by both reward size (two-way ANOVAs; monkey H: F(2,666) = 10.47, p<0.001; monkey C: F(2,708) = 5.36, p=0.0049) and delay to reward (H: F(1,666) = 22.62, p<0.001; C: F(1,708) = 8.03, p=0.0047). Although the total proportion of rejected trials differed across monkeys (two-sample t-test; t(229) = 3.96, p<0.001), a similar behavioral pattern was observed in both animals during the task. The proportion of rejected trials was higher for smaller rewards and longer delays, while both animals waited more patiently to obtain larger rewards. Figure 1 with 1 supplement see all Download asset Open asset Delayed reward task and behavioral performance. (A) Temporal sequence of task events. After the monkey initiated a trial by positioning a cursor (+) within a visual target (gray circle), an instruction cue was presented briefly signaling the reward size (one, two, or three drops of food) and the delay-to-reward (short or long). The animal was required to maintain the cursor position over the waiting period to successfully obtain reward. (B, C) Rejection rates (mean ± SEM) were calculated and averaged for the six possible reward contingencies across sessions. Measures were affected by both reward size and delay (two-way ANOVA). Size: ***p<0.001, **p<0.01; ddelay: ### p<0.001, ## p<0.01. (D–G) For each animal, a temporal discount factor (k) was found that yielded the best fit between averaged rejections rates and the hyperbolic model expressed by Equation 2. Goodness of fit was evaluated by the coefficient of determination (R2). (H, I) EMG signals collected in monkey H were aligned on the presentation of cues. The effects of reward size and delay were examined using a series of two-way ANOVAs. Red lines indicate the statistical threshold (p<0.05/173; Bonferroni correction). Interactive effects between reward size and delay (H: F(2,666) = 19.31, p<0.001; C: F(2,708) = 10.31, p<0.001) revealed an integration of both task parameters to estimate the overall desirability or subjective value of each cost–benefit condition. To characterize how subjective value declined with delay, we fitted the averaged rejection rates to a hyperbolic discounting model (as expressed by Equation 2). To be more specific, we inferred the temporal discount factor (k) that maximized the inverse relation between each monkey’s behavior and the subjective value calculated from a hyperbolic function. Consistent with other monkey studies (Hori et al., 2021; Minamimoto et al., 2009), the animals’ task performance was well approximated by an inverse relation with hyperbolic delay discounting (H: R2 = 0.98; C: R2 = 0.86; Figure 1D and E). The resulting discount rates calculated for the two animals were relatively similar in value (Figure 1F and G). In comparison, however, monkey C was a bit more impatient with a steeper delay discounting (k = 1.62 s–1), while the subjective value estimated by monkey H was slightly less impacted by the cost of waiting (k = 1.28 s–1). Controls We recorded EMGs from different muscles (trapezius, deltoid, pectoralis, triceps, biceps) while monkey H performed the behavioral task. During the post-instruction waiting interval, when the animal remained static, the maintenance of the arm posture resulted in a slight increase in the tonic activity of shoulder muscles (Figure 1H and I). As evidenced by a series of two-way ANOVAs (reward × delay, p<0.05/173-time bins), muscle patterns were not altered by reward contingencies. This suggests that monkeys controlled their posture with a constant motor output across trial conditions, independent of reward size and delay. Alternatively, as monkeys were not required to control their gaze while performing the task, we found that their eye positions varied according to the type of trial (Figure 1—figure supplement 1). Eye position was affected by both the expected reward size and delay after the presentation of instruction cues (two-way ANOVAs, p<0.05/173-time bins). Reward-by-delay interactions detected in eye position after instruction offset reinforce the view that cost–benefit parameters were integrated into a common valuation by monkeys. To confirm the ability of our animals to recognize and evaluate appropriately the different instruction cues, the animals also performed a variant of the task that required decision-making. In this variant, the monkey was allowed to choose freely between two alternate reward size or delay conditions. We observed appropriately strong preferences for the cues that predicted large rewards and short delays. Monkeys selected the more advantageous option in terms of reward when the delays were equal (H: 97%, t(14) = 26.99, p<0.001; C: 99%, t(16) = 30.55, p<0.001) and the more advantageous delay option when reward sizes were held constant (H: 99%, t(11) = 29.71, p<0.001; C: 95%, t(10) = 24.82, p<0.001). Neuronal activity of STN reflects reward size and delay While the monkeys performed the delayed reward task, we recorded single-unit activity from 231 neurons in the right STN (112 from monkey H; 119 from monkey C). Similar to our previous study (Pasquereau and Turner, 2017), STN neurons were identified based on location and standard electrophysiological criteria (Figure 2A–C). Most STN neurons exhibited changes in firing rate at one or more times in the task. Approximately 42% of neurons demonstrated a peak in activity in the first second following presentation of the instruction cues, while 32% of neurons exhibited highest discharge rates later during the waiting period (Figure 2D). Despite the fact that phasic changes evoked by instruction cues dominated the population-averaged activity (Figure 2E), we found that the variability of neuronal activities across the six reward/delay conditions was maintained at an elevated constant level across several seconds of the trial, as evidenced by the Fano Factor shown in Figure 2F. This suggests that task-relevant information was processed by STN neurons not only immediately after the presentation of the instruction cue, but also later in the course of the post-instruction delay period, when the animal was maintaining a stable arm position in anticipation of reward delivery. Figure 2 Download asset Open asset Subthalamic nucleus (STN) neurons were modulated by reward size and delay. (A, B) Reconstruction of a trajectory used for STN recordings with a structural MRI and high-resolution 3-D templates of individual nuclei derived from Martin and Bowden, 1996. Globus pallidus (GP), substantia nigra (SN), zona incerta (ZI), and thalamus (THL). (C) Sample of action potential waveforms emitted by STN neurons. (D) Color map histograms of neuronal activities recorded from the STN. Each horizontal line indicates neural activity aligned to instruction cues averaged across trial types. Neuronal firing rates were Z-score normalized. (E) Population-averaged activities of STN neurons, and (F) Fano factors that showed the variability of the neural population ensemble across the six possible reward contingencies. The width of the curves indicates the population SEM. (G–L) Influence of reward size and delay on individual neural activities was detected by a series of two-way ANOVAs (p<0.05/173, Bonferroni correction). The time course of encoding of task-relevant information (left column) and the fractions of neurons modulated by reward size and/or delay (right column) were represented for each time bin. Pie charts show the total fraction of STN neurons influenced by reward size (blue), delay (green), or both task parameters simultaneously (gray). To determine whether and when STN neurons were involved in the evaluation of different task conditions, we tested the neural activities for effects of reward size and delay using two-way ANOVAs combined with a sliding window procedure (p<0.05/174-time bins). Because the variability of neuronal activities across task conditions was sustained over time, we analyzed the spiking activity of each neuron in a time-resolved way across a continuous 3.5 s period following the presentation of the instruction cue. Of the 231 neurons recorded, 112 (48%; 21 from monkey H and 91 from monkey C) and 91 (39%; 18 from monkey H and 73 from monkey C) were modulated by reward size and delay, respectively (Figure 2G–J). Interestingly, the two types of encoding occurred preferentially during different periods of the trial. Immediately following cue presentation, neurons were strongly influenced by the reward size signaled by the instruction while, later in the trial, as animals endured the waiting period, encoding of delay became more common. Among the 79 neurons (34%; 14 from monkey H and 65 from monkey C) sensitive to both parameters at some point over the course of the trial, 50 (22%; 8 from monkey H and 42 from monkey C) were influenced simultaneously by reward size and delay within the same time bins, thereby reflecting a direct integration of cost–benefit conditions by individual neurons (Figure 2K and L). Reward-by-delay interactions were scattered in a roughly uniform distribution across the course of the post-instruction period. Overall, these results suggest that the way STN neurons represented task conditions evolved dynamically across the course of a trial. Dynamic encodings of reward size and delay To examine how reward size and delay were encoded by individual STN units and how that encoding changed across time in a trial, we performed time-resolved linear regressions with single-unit neural activity as the dependent variable. For each task-related neuron (i.e., neurons encoding at least one task parameter for a least one-time bin, n = 124), we tested whether the firing rate was modulated by the expected reward quantity and the delay to reward delivery (as expressed by Equation 3). Because the STN contains an oculomotor territory (Matsumura et al., 1992), we included measures of eye movements (i.e., gaze position and gaze velocity) in our model as nuisance variables. (Exclusion of eye parameters from this analysis produced very similar results – Figure 4—figure supplement 1.) As illustrated in Figure 3 with three example neurons, task parameters were encoded in the STN at different stages of the trial following different modalities. (Thresholds for significant regression coefficients were calculated relative to their values during the pre-instruction period using a one-sample t-test, df = 46, p<0.05.) Based on the polarity of the regression coefficients βReward, we found neurons whose activity transiently indexed reward size by increasing (e.g., neuron #1) or decreasing (e.g., neuron #2) their firing rate. Similarly, by detecting changes in the regression coefficients βDelay, we found neurons that increased (e.g., neuron #2) or decreased (e.g., neuron #1) their activity as a function of the delay to reward. Neural activities were often influenced in opposite directions by the predicted amount of reward and the delay (positive βReward with negative βDelay, or vice versa). The specific pattern of task encoding within individual cells, however, often changed over the course of the trial. For example, in the third exemplar unit activity shown in Figure 3 (right column), the influence of reward size on firing rate (i.e., βReward) reversed repeatedly in the post-instruction epoch. This type of variability in the regression coefficients impeded simple approaches for categorization of STN neurons via their pattern of encodings (e.g., positive reward encoding vs. negative). Figure 3 Download asset Open asset Response of subthalamic nucleus (STN) neurons to the six possible reward contingencies. (A) The activity of three exemplar neurons that were classified as task-related cells. Spike density functions and raster plots were constructed separately around the presentation of instruction cues for the different cost–benefit conditions. (B) A sliding window regression analysis compared firing rates between trial types (as expressed by Equation 3). The regression coefficients (yellow-to-black lines) were used to characterize the dynamic encoding of reward size (βReward) and delay (βDelay). The horizontal dashed lines indicate the statistical threshold for significant β values (calculated from the pre-instruction period with a one-sample t-test, df = 46, p<0.05). (C) Time series of regression coefficients projected into an orthogonal space where reward size and delay composed the two dimensions. Vector time series were produced for significant β values. Black dashed lines indicate statistical thresholds. The angle (θ) of the vector sum (red dashed lines) was calculated to identify how neurons integrated cost–benefit conditions during the two consecutive phases of the waiting period (phase 1: 0–2 s, phase 2: 2–3.5 s). Mixed encodings of reward size and delay To gain deeper insight into how the reward and delay dimensions of the task were integrated by a neuron’s activity, we projected the unit’s time series of regression coefficients from Equation 3 (βReward and βDelay) into a space in which reward size and delay compose two orthogonal dimensions. For each neuron, vector time series were produced in this regression space for significant β values (p<0.05) to capture the moment-by-moment mixture of encodings (Figure 3C). In this space, vector angles indicated how a neuron’s activity reflected the combined effects of reward size and delay, while vector magnitude captured the strength of the combined encoding. To determine the predominant encoding of these two characteristics (angle and magnitude of moment-by-moment vectors) and their evolution during the post-instruction epoch, we summed across the time-resolved vectors across two consecutive phases of the waiting period (e.g., red dashed lines for phase 1 [0–2 s post-instruction] and phase 2 [2–3.5 s post-instruction] in Figure 3C). The angles (θ1 and θ2) of the resulting vector sums were used to identify consecutive patterns of activity consistent with, and those inconsistent with, encoding of a temporal discounting of reward value – that is, an encoding in which reward size and delay have opposing effects on firing rate (Figure 4A). Over the two phases of the waiting period, some neurons exhibited a consistently positive encoding of reward combined with a negative encoding of delay (vector angles –90° < θ < 0°; referred to as the ‘Discounting–’ pattern in Figure 4A; see, e.g., Figure 3C, right), while others modulated their activity in the converse pattern with a negative βReward and positive βDelay value (90° < θ < 180°; referred to as ‘Discounting+’ pattern; see, e.g., Figure 3C, middle). Other neurons drastically changed their pattern between phases of the waiting period (Figure 3C, left). And some encoded reward size and delay in an additive fashion, inconsistent with a signal reflecting subjective value and referred to here as ‘Compounding+’ and ‘Compounding–’ patterns (Figure 4A; see, e.g., Figure 3C, left). Compounding signals like these are inconsistent with a temporal discounting of reward value and may instead be attributable to extraneous factors such as arousal or attentional engagement. Figure 4 with 1 supplement see all Download asset Open asset Subthalamic nucleus (STN) neurons exhibit mixed signals in phase 1 (0–2 s post-instruction) and phase 2 (2–3.5 s post-instruction). (A) Schematic depiction of the regression subspace composed of reward size and delay. Various patterns of neural encoding could be categorized depending on the angle (θ) of vectors: Discounting– (between –90 and 0°); Discounting+ (between 90 and 180°); Compounding+ (between 0 and 90°); Compounding– (between –180 and –90°). (B) The angle differences calculated between phases 1 and 2 (θ2 – θ1) show how the neural encodings were modified during the course of the hold period. A positive angle difference corresponds to a turn anticlockwise, while a negative one corresponds to a turn clockwise. (C, D) Vectorial encoding of reward size and delay for all task-related neurons in phases 1 (C) and 2 (D). Vector sums were calibrated by subtracting the mean β values of the pre-instruction epoch and then dividing by 2 SD of this control period. The red dashed lines indicate the population vectors. (E, F) Fractions of task-related neurons categorized as Discounting cells (Disc-, Disc+) or Compounding cells (Comp+, Comp-). (G, H) Vector magnitudes (mean ± SEM) were compared between different categories of task-related neurons (one-way ANOVA *F > 2.1, p<0.05). The central line of the box plots represents the median, the edges of the box show the interquartile range, and the edges of the whiskers show the full extent of the overall distributions. At the population level, STN neurons exhibited variable mixed signals over the course of the waiting period with, on average, a change in angle of 34° measured between vector sums of phases 1 and 2 (Figure 4B–D). First, in phase 1, the neural encoding of reward size and delay parameters predominantly followed a Discounting pattern as evidenced by the fraction of neurons with Discounting– type encodings (χ2 = 17.1, df = 3, p=0.007; Figure 4E), and the longer mean vector magnitude of Discounting– units (one-way ANOVA, F(3,121) = 4.54, p=0.005; Figure 4G). Of the 124 task-related neurons, 60 (48%) increased firing rates as a function of reward size while they decreased according to the temporal delay to reward delivery (i.e., consistent with a Discounting– pattern). The remaining neurons were distributed across the other three encoding patterns. Vector magnitudes for neurons with a Discounting– firing pattern were longer, on average, than those for neurons with either type of Compounding pattern, while the vector magnitude for neurons with a Discounting+ pattern fell in-between. Notably, the angle of the mean vector across all neurons (θ1 = –12°, Figure 4C) showed that, despite the wide diversity of encoding patterns across individual neurons in phase 1, the whole neural ensemble encoded information about reward size and delay in a pattern that was strongly consistent with a temporal discounting of value (i.e., Discounting–). The significance of this bias in the population encoding was supported further by the observation that the angles of individual vectors were distributed in a markedly non-uniform fashion (Rayleigh’s test, z = 12.5, p<0.001; Figure 4C). Hence, during the first 2 s of the post-instruction period, the neural ensemble combined information related to reward size and delay into a coherent population-scale signal that reflected subjective value according to a Discounting– pattern. Then, in phase 2, the population encoding of reward size and delay parameters changed drastically to a predominantly Compounding pattern (Figure 4D). The fraction of neurons with Compounding+ type encodings increased markedly (χ2 = 9.9, df = 3, p=0.02; Figure 4F), and mean vector magnitude of Compounding+ units was longer than those of units with other types of encoding (one-way ANOVA, F(3,1

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call