Decision Threshold Learning in the Basal Ganglia for Multiple Alternatives.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

In recent years, researchers have integrated the historically separate, reinforcement learning (RL), and evidence-accumulation-to-bound approaches to decision modeling. A particular outcome of these efforts has been the RL-DDM, a model that combines value learning through reinforcement with a diffusion decision model (DDM). While the RL-DDM is a conceptually elegant extension of the original DDM, it faces a similar problem to the DDM in that it does not scale well to decisions with more than two options. Furthermore, in its current form, the RL-DDM lacks flexibility when it comes to adapting to rapid, context-cued changes in the reward environment. The question of how to best extend combined RL and DDM models so they can handle multiple choices remains open. Moreover, it is currently unclear how these algorithmic solutions should map to neurophysical processes in the brain, particularly in relation to so-called go/no-go-type models of decision making in the basal ganglia. Here, we propose a solution that addresses these issues by combining a previously proposed decision model based on the multichoice sequential probability ratio test (MSPRT), with a dual-pathway model of decision threshold learning in the basal ganglia region of the brain. Our model learns decision thresholds to optimize the trade-off between time cost and the cost of errors and so efficiently allocates the amount of time for decision deliberation. In addition, the model is context dependent and hence flexible to changes to the speed-accuracy trade-off (SAT) in the environment. Furthermore, the model reproduces the magnitude effect, a phenomenon seen experimentally in value-based decisions and is agnostic to the types of evidence and so can be used on perceptual decisions, value-based decisions, and other types of modeled evidence. The broader significance of the model is that it contributes to the active research area of how learning systems interact by linking the previously separate models of RL-DDM to dopaminergic models of motivation and risk taking in the basal ganglia, as well as scaling to multiple alternatives.

Similar Papers
  • Research Article
  • Cite Count Icon 155
  • 10.1016/j.neuron.2008.12.003
Similarity Effect and Optimal Control of Multiple-Choice Decision Making
  • Dec 1, 2008
  • Neuron
  • Moran Furman + 1 more

Similarity Effect and Optimal Control of Multiple-Choice Decision Making

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 165
  • 10.3758/s13423-018-1554-2
A reinforcement learning diffusion decision model for value-based decisions
  • Mar 28, 2019
  • Psychonomic Bulletin & Review
  • Laura Fontanesi + 3 more

Psychological models of value-based decision-making describe how subjective values are formed and mapped to single choices. Recently, additional efforts have been made to describe the temporal dynamics of these processes by adopting sequential sampling models from the perceptual decision-making tradition, such as the diffusion decision model (DDM). These models, when applied to value-based decision-making, allow mapping of subjective values not only to choices but also to response times. However, very few attempts have been made to adapt these models to situations in which decisions are followed by rewards, thereby producing learning effects. In this study, we propose a new combined reinforcement learning diffusion decision model (RLDDM) and test it on a learning task in which pairs of options differ with respect to both value difference and overall value. We found that participants became more accurate and faster with learning, responded faster and more accurately when options had more dissimilar values, and decided faster when confronted with more attractive (i.e., overall more valuable) pairs of options. We demonstrate that the suggested RLDDM can accommodate these effects and does so better than previously proposed models. To gain a better understanding of the model dynamics, we also compare it to standard DDMs and reinforcement learning models. Our work is a step forward towards bridging the gap between two traditions of decision-making research.

  • Research Article
  • Cite Count Icon 48
  • 10.7554/elife.63055
A new model of decision processing in instrumental learning tasks.
  • Jan 27, 2021
  • eLife
  • Steven Miletić + 5 more

Learning and decision-making are interactive processes, yet cognitive modeling of error-driven learning and decision-making have largely evolved separately. Recently, evidence accumulation models (EAMs) of decision-making and reinforcement learning (RL) models of error-driven learning have been combined into joint RL-EAMs that can in principle address these interactions. However, we show that the most commonly used combination, based on the diffusion decision model (DDM) for binary choice, consistently fails to capture crucial aspects of response times observed during reinforcement learning. We propose a new RL-EAM based on an advantage racing diffusion (ARD) framework for choices among two or more options that not only addresses this problem but captures stimulus difficulty, speed-accuracy trade-off, and stimulus-response-mapping reversal effects. The RL-ARD avoids fundamental limitations imposed by the DDM on addressing effects of absolute values of choices, as well as extensions beyond binary choice, and provides a computationally tractable basis for wider applications.

  • Research Article
  • Cite Count Icon 11
  • 10.7554/elife.63055.sa2
A new model of decision processing in instrumental learning tasks
  • Dec 18, 2020
  • eLife
  • Steven Miletić + 5 more

Learning and decision-making are interactive processes, yet cognitive modeling of error-driven learning and decision-making have largely evolved separately. Recently, evidence accumulation models (EAMs) of decision-making and reinforcement learning (RL) models of error-driven learning have been combined into joint RL-EAMs that can in principle address these interactions. However, we show that the most commonly used combination, based on the diffusion decision model (DDM) for binary choice, consistently fails to capture crucial aspects of response times observed during reinforcement learning. We propose a new RL-EAM based on an advantage racing diffusion (ARD) framework for choices among two or more options that not only addresses this problem but captures stimulus difficulty, speed-accuracy trade-off, and stimulus-response-mapping reversal effects. The RL-ARD avoids fundamental limitations imposed by the DDM on addressing effects of absolute values of choices, as well as extensions beyond binary choice, and provides a computationally tractable basis for wider applications.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 55
  • 10.3758/s13415-019-00723-1
Decomposing the effects of context valence and feedback information on speed and accuracy during reinforcement learning: a meta-analytical approach using diffusion decision modeling
  • Jun 1, 2019
  • Cognitive, affective & behavioral neuroscience
  • Laura Fontanesi + 2 more

Reinforcement learning (RL) models describe how humans and animals learn by trial-and-error to select actions that maximize rewards and minimize punishments. Traditional RL models focus exclusively on choices, thereby ignoring the interactions between choice preference and response time (RT), or how these interactions are influenced by contextual factors. However, in the field of perceptual decision-making, such interactions have proven to be important to dissociate between different underlying cognitive processes. Here, we investigated such interactions to shed new light on overlooked differences between learning to seek rewards and learning to avoid losses. We leveraged behavioral data from four RL experiments, which feature manipulations of two factors: outcome valence (gains vs. losses) and feedback information (partial vs. complete feedback). A Bayesian meta-analysis revealed that these contextual factors differently affect RTs and accuracy: While valence only affects RTs, feedback information affects both RTs and accuracy. To dissociate between the latent cognitive processes, we jointly fitted choices and RTs across all experiments with a Bayesian, hierarchical diffusion decision model (DDM). We found that the feedback manipulation affected drift rate, threshold, and non-decision time, suggesting that it was not a mere difficulty effect. Moreover, valence affected non-decision time and threshold, suggesting a motor inhibition in punishing contexts. To better understand the learning dynamics, we finally fitted a combination of RL and DDM (RLDDM). We found that while the threshold was modulated by trial-specific decision conflict, the non-decision time was modulated by the learned context valence. Overall, our results illustrate the benefits of jointly modeling RTs and choice data during RL, to reveal subtle mechanistic differences underlying decisions in different learning contexts.

  • PDF Download Icon
  • Research Article
  • 10.3758/s13423-024-02520-5
The neural implausibility of the diffusion decision model doesn’t matter for cognitive psychometrics, but the Ornstein-Uhlenbeck model is better
  • May 14, 2024
  • Psychonomic Bulletin & Review
  • Jia-Shun Wang + 1 more

In cognitive psychometrics, the parameters of cognitive models are used as measurements of the processes underlying observed behavior. In decision making, the diffusion decision model (DDM) is by far the most commonly used cognitive psychometric tool. One concern when using this model is that more recent theoretical accounts of decision-making place more emphasis on neural plausibility, and thus incorporate many assumptions not found in the DDM. One such model is the Ising Decision Maker (IDM), which builds from the assumption that two pools of neurons with self-excitation and mutual inhibition receive perceptual input from external excitatory fields. In this study, we investigate whether the lack of such mechanisms in the DDM compromises its ability to measure the processes it does purport to measure. We cross-fit the DDM and IDM, and find that the conclusions of DDM would be mostly consistent with those from an analysis using a more neurally plausible model. We also show that the Ornstein-Uhlenbeck Model (OUM) model, a variant of the DDM that includes the potential for leakage (or self-excitation), reaches similar conclusions to the DDM regarding the assumptions they share, while also sharing an interpretation with the IDM in terms of self-excitation (but not leakage). Since the OUM is relatively easy to fit to data, while being able to capture more neurally plausible mechanisms, we propose that it be considered an alternative cognitive psychometric tool to the DDM.

  • Research Article
  • Cite Count Icon 678
  • 10.1016/j.neuron.2008.09.034
Decision Making in Recurrent Neuronal Circuits
  • Oct 1, 2008
  • Neuron
  • Xiao-Jing Wang

Decision Making in Recurrent Neuronal Circuits

  • Research Article
  • Cite Count Icon 91
  • 10.1016/j.jmp.2018.09.004
Estimating across-trial variability parameters of the Diffusion Decision Model: Expert advice and recommendations
  • Oct 16, 2018
  • Journal of Mathematical Psychology
  • Udo Boehm + 17 more

Estimating across-trial variability parameters of the Diffusion Decision Model: Expert advice and recommendations

  • Research Article
  • Cite Count Icon 34
  • 10.1016/j.jml.2017.04.003
Diffusion vs. linear ballistic accumulation: Different models, different conclusions about the slope of the zROC in recognition memory
  • May 17, 2017
  • Journal of Memory and Language
  • Adam F Osth + 3 more

Diffusion vs. linear ballistic accumulation: Different models, different conclusions about the slope of the zROC in recognition memory

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 24
  • 10.1186/s41235-018-0119-2
The impact of speed and bias on the cognitive processes of experts and novices in medical image decision-making
  • Jul 4, 2018
  • Cognitive Research: Principles and Implications
  • Jennifer S Trueblood + 9 more

Training individuals to make accurate decisions from medical images is a critical component of education in diagnostic pathology. We describe a joint experimental and computational modeling approach to examine the similarities and differences in the cognitive processes of novice participants and experienced participants (pathology residents and pathology faculty) in cancer cell image identification. For this study we collected a bank of hundreds of digital images that were identified by cell type and classified by difficulty by a panel of expert hematopathologists. The key manipulations in our study included examining the speed-accuracy tradeoff as well as the impact of prior expectations on decisions. In addition, our study examined individual differences in decision-making by comparing task performance to domain general visual ability (as measured using the Novel Object Memory Test (NOMT) (Richler et al. Cognition 166:42–55, 2017). Using signal detection theory and the diffusion decision model (DDM), we found many similarities between experts and novices in our task. While experts tended to have better discriminability, the two groups responded similarly to time pressure (i.e., reduced caution under speed instructions in the DDM) and to the introduction of a probabilistic cue (i.e., increased response bias in the DDM). These results have important implications for training in this area as well as using novice participants in research on medical image perception and decision-making.

  • Research Article
  • Cite Count Icon 7
  • 10.3758/s13428-023-02162-w
PyBEAM: A Bayesian approach to parameter inference for a wide class of binary evidence accumulation models.
  • Aug 7, 2023
  • Behavior research methods
  • Matthew Murrow + 1 more

Many decision-making theories are encoded in a class of processes known as evidence accumulation models (EAM). These assume that noisy evidence stochastically accumulates until a set threshold is reached, triggering a decision. One of the most successful and widely used of this class is the Diffusion Decision Model (DDM). The DDM however is limited in scope and does not account for processes such as evidence leakage, changes of evidence, or time varying caution. More complex EAMs can encode a wider array of hypotheses, but are currently limited by computational challenges. In this work, we develop the Python package PyBEAM (Bayesian Evidence Accumulation Models) to fill this gap. Toward this end, we develop a general probabilistic framework for predicting the choice and response time distributions for a general class of binary decision models. In addition, we have heavily computationally optimized this modeling process and integrated it with PyMC, a widely used Python package for Bayesian parameter estimation. This 1) substantially expands the class of EAM models to which Bayesian methods can be applied, 2) reduces the computational time to do so, and 3) lowers the entry fee for working with these models. Here we demonstrate the concepts behind this methodology, its application to parameter recovery for a variety of models, and apply it to a recently published data set to demonstrate its practical use.

  • Research Article
  • 10.1177/10711813251361000
Experimental Investigation and Queuing Network (QN) Modeling of Speed-Accuracy Tradeoff (SAT) and Speed-Confidence Tradeoff (SCT) in Multi-Task Human-Robot Collaboration (HRC)
  • Aug 5, 2025
  • Proceedings of the Human Factors and Ergonomics Society Annual Meeting
  • Yuanchen Wang + 2 more

In Human-Robot Collaboration (HRC) environments, people are often required to perform multiple tasks under time pressure while monitoring or interacting with robotic systems. Time pressure may make people decide faster, but with lower accuracy and confidence, showing Speed-Accuracy Tradeoff (SAT) and Speed-Confidence Tradeoff (SCT). Understanding how humans make decisions under multitasking conditions with time pressure is important for enhancing safety and productivity. To investigate these effects, we conducted experiments in which participants viewed video clips of a robot arm reaching for one of two possible objects and predicted the robot’s final target with four levels of time pressure. In the multitasking condition, participants also performed a concurrent tracking task to simulate a continuous robot control task. Experimental results revealed clear evidence of SAT and SCT, along with significant negative effects of multitasking on prediction accuracy, confidence, and response time. To explain and account for these effects, we developed a computational model, by using the departure processes of the Queuing Network–Model Human Processor (QN-MHP) as a diffusion decision model. The model accurately replicated experimental results, highlighting its potential for predicting human behavior in multitasking human-robot collaboration scenarios.

  • Research Article
  • Cite Count Icon 507
  • 10.1146/annurev-psych-122414-033645
Sequential Sampling Models in Cognitive Neuroscience: Advantages, Applications, and Extensions
  • Sep 17, 2015
  • Annual Review of Psychology
  • B.U Forstmann + 2 more

Sequential sampling models assume that people make speeded decisions by gradually accumulating noisy information until a threshold of evidence is reached. In cognitive science, one such model--the diffusion decision model--is now regularly used to decompose task performance into underlying processes such as the quality of information processing, response caution, and a priori bias. In the cognitive neurosciences, the diffusion decision model has recently been adopted as a quantitative tool to study the neural basis of decision making under time pressure. We present a selective overview of several recent applications and extensions of the diffusion decision model in the cognitive neurosciences.

  • Research Article
  • 10.3758/s13423-026-02861-3
The diffusion model's drift rate parameter primarily reflects efficiency, rather than speed, of evidence accumulation.
  • Feb 26, 2026
  • Psychonomic bulletin & review
  • Alexander Weigard + 3 more

Applications of the diffusion decision model (DDM) to the study of cognitive individual differences consistently find that the model's drift rate (v) parameter forms a cohesive factor across many tasks and relates to measures of higher-order cognitive functioning, including general cognitive ability and working memory. This parameter is often interpreted as a measure of "processing speed," a traditional psychometric construct thought to reflect an individual's basic speed of information processing across tasks. However, conceptual differences between v and traditional notions of processing speed make this mapping far from straightforward. Racing accumulator models, which provide a more flexible and comprehensive account of behavioral data than the DDM, allow for the speed with which individuals accumulate evidence to be dissociated from the efficiency with which they accumulate task-relevant evidence (versus task-irrelevant evidence). We applied the DDM and a racing accumulator model to three tasks across three independent datasets to gauge the extent to which v parameter findings from the cognitive individual differences literature reflect speed of evidence accumulation (SEA) versus efficiency of evidence accumulation (EEA). Across all tasks, v was more strongly related to EEA than SEA. EEA was consistently related to measures of general cognitive ability, working memory, and executive function whereas SEA explained <1% of the variance in each. These findings suggest individual differences in the DDM's v parameter, and its relations with higher-order cognitive abilities, primarily reflect EEA rather than SEA and challenge the widespread practice of equating v with the traditional "processing speed" construct.

  • PDF Download Icon
  • Abstract
  • 10.1186/1471-2202-14-s1-p425
An actor-critic model of saccade adaptation
  • Jul 1, 2013
  • BMC Neuroscience
  • Manabu Inaba + 1 more

The basal ganglia and the cerebellum are subcortical structures indispensable for voluntary motor control and motor learning. They are thought to perform reinforcement learning and supervised learning, respectively, and interact with each other [1]. Yet, how these structures and their learning mechanisms interact remains unknown. In this study, we propose a model of interaction between the basal ganglia and the cerebellum for voluntary motor control and motor learning. We consider that the basal ganglia performs temporal difference (TD) learning. Specifically, according to the electrophysiological experiments [2], we assume that neurons in ventral tegmental area (VTA), a part of the basal ganglia, represent the value of delta, the prediction error of TD-learning. On the other hand, we consider that the cerebellum generates motor commands through supervised learning, for which the inferior olive (IO) provides teacher signals. Here, based on the anatomical findings of dopaminergic inputs from VTA to the IO [3], we assume that the cerebellum can receive the information of TD-prediction error as teacher signals via the IO. In the end, we propose a scheme of the interaction between the basal ganglia and the cerebellum as an actor-critic model in reinforcement learning (Figure ​(Figure1A,1A, [4]). Figure 1 (A) Proposed scheme of interaction between the basal ganglia and the cerebellum as an actor-critic model. (B) Illustration of direction-adaptation of saccades. We adopt the proposed scheme to double-step adaptation of saccade, which is voluntary eye movement and is mediated by a distributed network including both the basal ganglia and the cerebellum. A double-step saccade adaptation paradigm called direction adaptation goes as follows (Figure ​(Figure1B).1B). Initially, the eye is fixated at the center position. Next, a target appears at a certain position, and the eye moves to the target (first saccade). When the saccade starts, the target is immediately removed and reappears to another position. In turn, the eye moves to the second target (corrective saccade). By repeating many trials, when the first target appears, the eye moves to the position of the expected second target. Our proposed model reproduces this direction-adaptation of saccades. These results suggest that the interaction between the basal ganglia and the cerebellum as an actor-critic model provides a powerful motor control and learning mechanism.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.