Discovery Logo
Sign In
Search
Paper
Search Paper
Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link

Related Topics

  • Bandit Algorithm
  • Bandit Algorithm

Articles published on Thompson Sampling

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
427 Search results
Sort by
Recency
  • Research Article
  • 10.1007/s00422-026-01037-5
A bio-inspired minimal model for non-stationary K-armed bandits.
  • Mar 10, 2026
  • Biological cybernetics
  • Krubeal Danieli + 1 more

While reinforcement learning algorithms have made significant progress in solving multi-armed bandit problems, they often lack biological plausibility in architecture and dynamics. Here, we propose a bio-inspired neural model based on interacting populations of rate neurons, drawing inspiration from the orbitofrontal cortex and anterior cingulate cortex. Our model reports robust performance across various stochastic bandit problems, matching the effectiveness of standard algorithms such as Thompson Sampling and UCB. Notably, the model exhibits adaptive behavior: employing greedy strategies in low-uncertainty situations while increasing exploratory behavior as uncertainty rises. Through evolutionary optimization, the model's hyperparameters converged to values that align with the principles of synaptic mechanisms, particularly in terms of synapse-dependent neural activity and learning rate adaptation. These findings suggest that biologically-inspired computational architectures can achieve competitive performance while providing insights into neural mechanisms of decision-making under uncertainty.

  • Research Article
  • 10.7717/peerj-cs.3688
Biologically-inspired emotional processing for adaptive decision-making in non-stationary environments
  • Feb 27, 2026
  • PeerJ Computer Science
  • Jaeyeon Kim + 1 more

Background Artificial intelligence (AI) often struggles to adapt in non-stationary environments where conditions change unpredictably. In contrast, biological organisms utilize emotional processes not as irrational noise, but as rapid heuristics for managing uncertainty. This study investigates whether computational mechanisms inspired by mammalian affective systems provide advantages for adaptive decision-making. Methods The Emotional-Cognition Integration Architecture (ECIA) was developed, incorporating computational analogs of eight emotion-like signals designed for reinforcement learning contexts, hippocampus-inspired episodic memory, and dopamine-modulated adaptive learning. Using large-scale experimental replication (3,600 runs across 12 master seeds), ECIA was evaluated against both traditional algorithms (ε-greedy, upper confidence bound (UCB), Thompson Sampling) and improved non-stationary baselines (Sliding Window UCB, Adaptive Thompson Sampling) in three distinct environments designed to test different aspects of adaptation. Results ECIA demonstrated environment-specific performance patterns reflecting a functional trade-off. In unpredictable settings characterized by sudden regime shifts and stochastic perturbations, ECIA significantly outperformed all baselines ( p < 0.001). However, in strictly deterministic patterns, ECIA incurred a “cost of complexity,” underperforming compared to Naive UCB (0.8014 vs . 0.8522). This trade-off suggests functional specialization for uncertainty management rather than universal superiority. Ablation studies revealed strong synergistic integration among components, with combined removal causing disproportionate degradation far exceeding individual effects and highlighted a “dopamine paradox” where adaptive plasticity benefited uncertain environments but destabilized predictable ones. Conclusions These findings demonstrate that emotion-inspired computational mechanisms, drawn from mammalian brain architecture, function as specialized tools for managing environmental volatility. While they incur efficiency costs in stable environments, they provide essential robustness in high-uncertainty domains. This work offers both a practical framework for adaptive AI systems in domains such as clinical decision support and financial trading, and computational insights into why biological intelligence integrates affective processing with cognition.

  • Research Article
  • 10.2196/77323
Smartphone App Using Reinforcement Learning for Obesity: Single-Arm Feasibility Study.
  • Feb 26, 2026
  • JMIR human factors
  • Ken Kurisu + 4 more

While behavioral interventions remain an evidence-based treatment for obesity, they often require long durations and frequent sessions. To address this, we hypothesized that interventions delivered in daily life via a smartphone app combined with personalized optimization using reinforcement learning may effectively support behavior changes. This study aimed to develop and evaluate the feasibility of such an app for individuals with obesity. We developed a smartphone app to assist in setting and reviewing daily behaviors related to weight loss. On the screen on which daily behaviors were shown, the order of presentation was optimized using Thompson sampling, a multiarmed bandit algorithm. Twenty individuals with obesity used the app for 4 weeks, and the daily app use rates were quantified. Body weight and mood status were measured daily during the study, and a brief-type self-administered diet history questionnaire and the International Physical Activity Questionnaire were administered at the beginning and end of the study. Changes in these measures were evaluated using the Wilcoxon signed rank test. Furthermore, the longitudinal data collected during this study were analyzed using a linear mixed-effects model to examine factors related to the number of behaviors performed daily. All 20 recruited individuals with obesity completed the 4-week study schedule. The median app use rate was 98.3% (range 76.9%-100%). Significant improvements were observed in BMI (median at start 34.9 kg/m2, range 27.4-52.9; median at end 34.1 kg/m2, range 26.7-51.0; P=.01), as well as daily energy intake and weekend sitting time. The linear mixed-effects model showed a significant association between higher preceding depressive mood levels and fewer behaviors (P<.001). The feasibility of the smartphone app using reinforcement learning for obesity was sufficient, and the potential effectiveness of the treatment was suggested. Preceding depressive mood may influence daily behaviors related to weight loss.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/math14040738
Manifold Causal Conditional Deep Networks for Heterogeneous Treatment Effect Estimation and Policy Evaluation
  • Feb 22, 2026
  • Mathematics
  • Jong-Min Kim

We present a comprehensive framework for estimating heterogeneous treatment effects and evaluating decision-making policies in high-dimensional settings. Our approach combines nonlinear manifold learning techniques—UMAP, t-SNE, and Isomap—with a Causal Conditional Deep Network (CCDN) to model complex nonlinear interactions among covariates, treatments, and outcomes. Within this framework, we assess five treatment assignment policies—Greedy, Thompson Sampling, Epsilon-Greedy, Random, and a novel LLM-guided Thompson policy—across simulated and real-world datasets, including Adult, Wine Quality, and Boston Housing. Empirical results reveal a fundamental trade-off: exploitative policies like Greedy minimize cumulative regret but underperform in recovering heterogeneous treatment effects, whereas exploratory policies, particularly Random and LLM-Thompson, achieve a lower Conditional Average Treatment Effect Root Mean Squared Error (CATE RMSE) by providing broader coverage of the action–covariate space. Notably, LLM-Thompson consistently delivers strong performance across noisy, real-world datasets, highlighting the advantage of uncertainty-aware exploration in capturing treatment heterogeneity. Overall, the framework demonstrates that integrating manifold-informed deep networks with principled exploration strategies enhances both policy optimization and individualized treatment effect estimation in high-dimensional, complex environments.

  • Research Article
  • 10.3390/fi18020100
A Comparative Analysis of Self-Aware Reinforcement Learning Models for Real-Time Intrusion Detection in Fog Networks
  • Feb 14, 2026
  • Future Internet
  • Nyashadzashe Tamuka + 5 more

Fog computing extends cloud services to the network edge, enabling low-latency processing for Internet of Things (IoT) applications. However, this distributed approach is vulnerable to a wide range of attacks, necessitating advanced intrusion detection systems (IDSs) that operate under resource constraints. This study proposes integrating self-awareness (online learning and concept drift adaptation) into a lightweight RL (reinforcement learning)-based IDS for fog networks and quantitatively comparing it with non-RL static thresholds and bandit-based approaches in real time. Novel self-aware reinforcement learning (RL) models, the Hierarchical Adaptive Thompson Sampling–Reinforcement Learning (HATS-RL) model, and the Federated Hierarchical Adaptive Thompson Sampling–Reinforcement Learning (F-HATS-RL), were proposed for real-time intrusion detection in a fog network. These self-aware RL policies integrated online uncertainty estimation and concept-drift detection to adapt to evolving attacks. The RL models were benchmarked against the static threshold (ST) model and a widely adopted linear bandit (Linear Upper Confidence Bound/LinUCB). A realistic fog network simulator with heterogeneous nodes and streaming traffic, including multi-type attack bursts and gradual concept drift, was established. The models’ detection performance was compared using metrics including latency, energy consumption, detection accuracy, and the area under the precision–recall curve (AUPR) and the area under the receiver operating characteristic curve (AUROC). Notably, the federated self-aware agent (F-HATS-RL) achieved the best AUROC (0.933) and AUPR (0.857), with a latency of 0.27 ms and the lowest energy consumption of 0.0137 mJ, indicating its ability to detect intrusions in fog networks with minimal energy. The findings suggest that self-aware RL agents can detect traffic–dynamic attack methods and adapt accordingly, resulting in more stable long-term performance. By contrast, a static model’s accuracy degrades under drift.

  • Research Article
  • 10.54097/hcgmkt71
Multi-Armed Bandits: Algorithms, Applications, and Future Directions
  • Jan 29, 2026
  • Academic Journal of Science and Technology
  • Yurong Zheng

The Multi-Armed Bandit (MAB) problem is a fundamental framework in sequential decision-making. The main goal is to balance exploration of uncertain options with exploitation of known rewards. Originating from classical probability theory, MAB has evolved into a significant modern reinforcement learning. This problem can be applied to all walks of life in reality, from online advertising and recommendation systems to healthcare and finance. Over the years, algorithms such as Explore-Then-Commit (ETC), Upper Confidence Bound (UCB), and Thompson Sampling (TS) have emerged as leading strategies, each offering unique trade-offs between exploration efficiency and theoretical regret bounds. This paper gives an overview of the history, main algorithms, applications, and challenges of MAB, and discusses possible future research directions. Furthermore, it highlights the practical implications of these algorithms for real-world decision-making scenarios.

  • Research Article
  • 10.54097/schj0v87
A Comparative Analysis of Cumulative Regret Based on Multi-Armed Bandit Algorithms
  • Jan 29, 2026
  • Academic Journal of Science and Technology
  • Muqing Xue

This study aims to conduct a detailed comparison of the performance of three classic Multi-Armed Bandit algorithms: Thompson Sampling, UCB, and ETC. The MAB problem, as an important sequential decision-making framework, primarily challenges lie in how to strike a balance between "exploration" and "exploitation". We quantitatively analyzed each algorithm's long-term performance using 100 independent experiments and cumulative regret as the primary metric. The experimental findings demonstrate that the three algorithms' performance varies significantly in the tested context. The Thompson sampling method performed the best, with the least increase in regret and the lowest final value. The UCB algorithm performed second-best, with regret growing logarithmically. The ETC algorithm saw rapid accumulation of regret in the early stages before stabilizing, but it had the poorest performance because it lacked the ability for continuous exploration. These findings confirm that the Thompson sampling method is the most efficient in balancing exploration and exploitation, and is the best choice for solving such Random Stationary Multi-Armed Bandit Problem.

  • Research Article
  • 10.1016/j.cels.2025.101476
Risk-averse optimization of genetic circuits under uncertainty.
  • Jan 21, 2026
  • Cell systems
  • Michal Kobiela + 2 more

Engineering biological systems with specified functions requires navigating an extensive design space, which is challenging to achieve with wet-lab experiments alone. To expedite the design process, mathematical modeling is typically employed to predict circuit function in silico ahead of implementation, which, when coupled with computational optimization, can be used to automatically identify promising designs. However, circuit models are inherently inaccurate, which can result in suboptimal or non-functional in vivo performance. To mitigate this, we propose combining Bayesian inference, Thompson sampling, and risk management to find optimal circuit designs. Our approach employs data from non-functional designs to estimate the distribution of model parameters and then employs risk-averse optimization to select design parameters that are expected to perform well, given parameter uncertainty and biomolecular noise. We illustrate the approach by designing adaptation circuits and genetic oscillators using real and simulated data, with models of varied complexity. A record of this paper's transparent peer review process is included in the supplemental information.

  • Research Article
  • 10.63593/jwe.2025.12.06
Cross-Border E-Commerce TikTok Live Streaming Data Three-Dimensional Optimization Model Construction and Empirical Study — Based on Singaporean Technology Product Markets and Scenario Migration to U.S. Warehousing Services
  • Jan 19, 2026
  • Journal of World Economy
  • Yiyang Wu

In zero-paid-traffic scenarios, TikTok technology live streams typically face a systemic dilemma characterized by scarce traffic entry points, inadequate audience retention, and depressed average order values. Extant research predominantly focuses on low-involvement product categories and paid growth strategies, leaving a theoretical gap in systematic investigation of organic growth mechanisms for high-involvement technology products. Grounded in attention economy theory and collaborative optimization theory, this study employs 42 live streams featuring 15 technology products from Tesen Global Technology in Singapore as a natural experiment. We construct a three-dimensional collaborative optimization model encompassing “time slots—scripts—product mix” and implement real-time attention allocation via online gradient descent algorithms, while dynamically iterating product combinations through Thompson Sampling. Validation using 73-day panel data demonstrates that post-intervention, organic follower growth increased by 208%, conversion rates rose by 125%, average order value climbed by 20.5%, and cumulative advertising expenditure savings reached $12,000; 5,000 randomization permutation tests confirm robust effects (p&lt;0.01). Furthermore, applying service marketing theory, we migrate the model to the U.S. small-to-medium warehousing sector, proposing an “inventory turnover rate visualization live stream + service package matrix” approach, which is projected to reduce customer-per-lead costs (CPL) from $180 to $90. This research establishes a multi-dimensional collaborative optimization framework for live streaming, filling theoretical voids regarding high-involvement product growth in zero-ad-spend contexts and providing a replicable methodological paradigm for organic cross-border e-commerce expansion.

  • Research Article
  • 10.1002/sim.70386
Bayesian Response-Adaptive Randomization for Cluster Randomized Controlled Trials.
  • Jan 1, 2026
  • Statistics in medicine
  • Yunyi Liu + 2 more

Bayesian Response-Adaptive Randomization for Cluster Randomized Controlled Trials.

  • Research Article
  • 10.54254/2755-2721/2026.tj30959
Dynamic Pilot Optimization of South Korea's Urban-Rural Fertility Policies Based on Improved Sliding Window UCB Algorithm
  • Dec 31, 2025
  • Applied and Computational Engineering
  • Yang Gou

South Korea is facing a significant low-fertility rate issue, with varying success in fertility policy outcomes between urban and rural areas. The traditional fixed-area pilot model struggles to adapt to non-stationary fluctuations in fertility rates, leading to high trial-and-error costs. This study addresses the optimization of urban-rural fertility policies by proposing an enhanced Sliding Window Upper Confidence Bound (SW-UCB) algorithm that combines a sliding window with a forgetting factor. It treats Seoul and South Jeolla Province as arms of a Multi-Armed Bandit model, defining the increase in fertility rate per unit subsidy as the reward and conducting a simulated 60-month pilot. The improved algorithm demonstrates a 22.3% reduction in cumulative regret compared to the traditional UCB algorithm and a 19.4% reduction compared to Thompson Sampling, effectively accommodating fluctuations in fertility rates and aiding the precise adjustment of policies.

  • Research Article
  • 10.54254/2755-2721/2025.ld30217
Multi-Armed Bandits and Clinical Medicine: A Survey of Algorithms, Evaluation, and Applications
  • Dec 3, 2025
  • Applied and Computational Engineering
  • Xiwen Guo

Personalized medicine and adaptive clinical trials aim to match optimal treatment plans to individual patients while validating new therapies. Traditional fixed-design trials have limitations, including resource wastage and scarce data. Artificial intelligence has led to the development of dynamic decision algorithms like the Multi-Armed Bandit (MAB) algorithm, which balances exploration and exploitation in treatment allocation. Researchers aim to integrate MAB with clinical needs while adhering to ethical guidelines. This paper discusses the use of Multi-armed Bandit (MAB) algorithms in medicine, highlighting their potential for optimizing treatment allocation and improving patient outcomes, despite ethical constraints and limited data, and their application in contextual and reinforcement learning settings. This research highlight key clinical applications such as adaptive dose-finding, personalized treatment selection, and digital health interventions, supported by both trial-based data and large-scale public datasets like MIMIC-III. Simulation studies are also discussed as a necessary complement to real-world data, facilitating algorithm validation under ethical and logistical constraints. Comparative evaluation of algorithms demonstrates that Bayesian methods, particularly Thompson Sampling and contextual bandits, often provide a more robust balance between efficiency and safety. However, challenges remain in scalability, interpretability, and regulatory acceptance. This research conclude by identifying promising directions for future research, including the integration of deep reinforcement learning and causal inference, which may further enhance the role of MABs in advancing personalized medicine and adaptive clinical trial design.

  • Research Article
  • 10.54254/2755-2721/2025.ld30166
Multi-arm Bandit Machine Exploration - Investigating the Performance Differences of Classical Algorithms Through Trade-off Analysis
  • Dec 3, 2025
  • Applied and Computational Engineering
  • Haokai Tang

The exploration-exploitation dilemma in multi-arm bandit problems has long been a classic challenge and serves as the foundation of reinforcement learning. It has applications in various industries, such as advertising online, A/B testing, and clinical medicine and so on. There are many MAB algorithms and each has its own advantages and disadvantages. This paper analyzes the performance of three classic MAB algorithms: the simple and effective -Greedy; the Upper Confidence Bound algorithm (UCB1)which is more optimistic when facing uncertainty; and Thompson Sampling, an approach rooted in Bayesian inference. This paper conducts simulation experiments under the Bernoulli Machine environment using three evaluation criteria: cumulative regret, convergence speed, and parameter dependence, and comprehensively analyzes the performance of the three algorithms. The results show that Thompson sampling achieved the lowest cumulative regret and the fastest convergence speed, followed by UCB1. The performance of -Greedy is highly sensitive to its hyperparameters. These findings may provide some practical guidance for algorithm selection in real-world scenarios with similar properties and validate the theoretical advantages of the probability matching strategy.

  • Research Article
  • 10.69987/jacs.2025.51201
Counterfactual Learning-to-Rank for Ads: Off-Policy Evaluation on the Open Bandit Dataset
  • Dec 3, 2025
  • Journal of Advanced Computing Systems
  • Hanqi Zhang

Reliable offline evaluation is a central bottleneck in ad recommendation and ranking systems: online A/B experiments are expensive, slow, and risky, while naive offline replay is biased when logs are collected by a non-random policy. Counterfactual learning-to-rank (LTR) and off-policy evaluation (OPE) address this bottleneck by leveraging logged bandit feedback with known propensities. This paper presents a reproducible experimental study of IPS/SNIPS/DR estimators and counterfactual policy construction in a multi-position setting using the Open Bandit Dataset (OBD) released by ZOZO. We evaluate estimator behavior in cross-policy settings (Random ↔ Bernoulli Thompson Sampling), characterize heavy-tailed importance weights, and study robustness under propensity clipping. We further construct stochastic ranking policies from a fitted reward model, including a diversity-aware slate policy, and quantify the CTR–diversity trade-off via a Pareto analysis. Finally, we conduct a semi-synthetic evaluation that preserves real OBD covariates but simulates rewards from a learned environment, enabling bias–variance curves under known ground truth. Across experiments, self-normalization and doubly robust corrections improve stability, while the dominant failure mode remains limited overlap that produces heavy-tailed weights; clipping mitigates variance at the cost of controlled bias.

  • Research Article
  • 10.54254/2753-8818/2026.ch30042
An Empirical Comparison of Bayesian LinUCB, UCB, and Thompson Sampling for Recommendation on MovieLens
  • Nov 26, 2025
  • Theoretical and Natural Science
  • Jingyun Wang

T Recommender systems have evolved into core business hubs, with approximately 35% of Amazon's revenue stemming from recommendation-guided behaviors. This study conducts a systematic comparative analysis of three multi-armed bandit algorithmsBayesian Linear Upper Confidence Bound (Bayesian LinUCB), Upper Confidence Bound (UCB), and Thompson Samplingusing the MovieLens dataset. The research evaluates algorithm performance across three key dimensions: cumulative regret, optimal arm selection frequency, and regret rate. Experimental variables are strictly controlled with consistent parameters, including decision steps and data division ratios to eliminate confounding factors. Results reveal significant performance differences among the algorithms within the limited experimental steps on the MovieLens dataset. UCB demonstrates optimal performance with the lowest cumulative regret (817.93) and highest optimal arm selection frequency (0.9822), followed by Thompson Sampling with moderate performance (cumulative regret: 2776.36, selection frequency: 0.924). Bayesian LinUCB performs poorly across all metrics, showing the highest cumulative regret (34105.02), lowest selection frequency (0.1324), and a regret rate of approximately 1, indicating linear rather than sublinear growth. The sublinear growth characteristic exhibited by UCB and Thompson Sampling confirms their superior exploration-exploitation balance, while Bayesian LinUCB's linear growth pattern suggests inadequate adaptation to the MovieLens dataset scenario, highlighting the importance of algorithm-dataset compatibility in recommendation systems.

  • Research Article
  • 10.54254/2753-8818/2026.ch29998
Contextual Multi-Armed Bandits for Dynamic News Recommendation: An Empirical Evaluation
  • Nov 26, 2025
  • Theoretical and Natural Science
  • Jiashuo Wang

With the advent of the information explosion era, personalized news recommendation faces critical challenges including cold start problems, real-time changes in user preferences, and information filter bubbles. Traditional collaborative filtering methods rely heavily on historical data and struggle to adapt to the rapid update characteristics of news content. This paper proposes a news recommendation solution based on Multi-Armed Bandit (MAB) algorithms, addressing these challenges by balancing exploration and exploitation. The study implements four core algorithms: -greedy algorithm balances exploration and exploitation through probability mechanisms; Upper Confidence Bound (UCB) algorithm employs optimistic estimation using confidence upper bounds; Thompson sampling adopts probability adaptation based on Bayesian framework; and Contextual Linear Bandit (LinUCB) integrates user and news features for personalized recommendations. Experiments Youdaoplaceholder0 on the MIND large-scale news dataset (containing 160,000 news articles, 1 million users) and 15 million click interactions) demonstrate that contextual bandit algorithms outperform traditional methods in click-through rate, dwell time, and recommendation diversity. Thompson sampling shows outstanding performance in click-through rates, while LinUCB excels in convergence speed and recommendation diversity. The experiments confirm that MAB algorithms can effectively adapt to dynamic changes in user preferences, providing a viable solution for real-time news recommendation systems.

  • Research Article
  • 10.54254/2753-8818/2026.ch30041
Comparative Analysis of ETC, UCB, and Thompson Sampling for Personalized Video Recommendations on Short-Video Platform
  • Nov 26, 2025
  • Theoretical and Natural Science
  • Shuqiao Chen

This study empirically compares three canonical Multi-Armed Bandit (MAB) algorithmsExplore-Then-Commit (ETC), fixed initial exploration, Upper Confidence Bound (UCB1), which is the optimism-driven uncertainty estimation, and Thompson Sampling (TS) with Bernoulli likelihood (TS-Bernoulli, posterior-sampling-based)for short-video recommendation, aiming to solve the exploration-exploitation tradeoff in real-time feed systems. Experiments were conducted on the ShortVideo-Interactions (SVI-200K) dataset, a simulated corpus with ~1.2 million timestamped impressions and clicks from 240,000 user sessions over 30 days, covering ~18,000 unique items to mimic real platform dynamics. Evaluations used a fixed horizon (T=2000 timesteps) and restricted candidates to the top 200 items (K=200) per run, spanning three practical scenarios: stable base, information-scarce cold-start (new items with no prior data), and preference-drifting temporal-shift. Results, aggregated over three pseudo-random seeds (2025, 2026, 2027), show TS-Bernoulli consistently outperforms peers: it achieves the highest Click-Through Rate (CTR) (0.452 in base, 0.402 in cold-start, 0.428 in temporal-shift) and lowest cumulative regret (418, 518, 467 respectively). These findings confirm that TS-Bernoullis posterior sampling enables robust adaptation to short-video recommendations key challenges (information scarcity and non-stationarity), providing a practical algorithm choice for real-world platforms.

  • Research Article
  • 10.1149/ma2025-0283605mtgabs
Enhancing Fast-Charging Protocols with Section-Based Bayesian Optimization for Lithium-Ion Batteries to Prevent Li-Plating
  • Nov 24, 2025
  • ECS Meeting Abstracts
  • Yoon-Mo Lee + 3 more

Global adoption of electric vehicles is accelerating to achieve carbon-neutrality goals, yet charging time remains a major barrier to user acceptance. Reports show that the Tesla Model 3 (82 kWh) and Hyundai Ioniq 5 (77.4 kWh) require about 25 and 18 minutes, respectively, to reach 80% state of charge (SOC), which is still longer than internal combustion refueling [1]. To address this, the U.S. Department of Energy (DOE) and the U.S. Advanced Battery Consortium (USABC) have targeted extreme fast charging to 80% SOC within 15 minutes [2]. However, high charging currents can cause premature cut-off, incomplete electrode utilization, and accelerated degradation, while non-uniform current distribution may trigger Li-plating near the separator [3,4]. Recently, model-based charging protocols with Bayesian optimization (BO) have gained traction [5–7], but they rarely incorporate direct constraints to suppress Li-plating. Here, we propose a framework that integrates a physics-based electrochemical model with BO to optimize fast-charging protocols for lithium-ion batteries [8,9]. The model, validated against experimental data with an average error of 25 mV in voltage and 0.26 °C in temperature across multiple C-rates, enables direct control of Li-plating potential as a safety constraint. Using a commercial 55.6 Ah pouch-type cell, two multi-step constant-current strategies were compared: a single-section protocol and a bi-section protocol that partitions the SOC window based on internal resistance. The bi-section approach reduced charging time by up to 11% relative to the single-section method, while maintaining plating-free operation and suppressing SEI growth. Under high-temperature conditions with preheated cells at 60 °C, the optimized protocol achieved 0–80% SOC in 629 s (10.5 min), thereby meeting the USABC 15-min target. Compared with the conventional CCCV method, the proposed BO-based protocols shortened charging time by up to 20% while reducing capacity degradation. Cycling tests directly compared the optimized BO-based protocols with the conventional CCCV method and revealed significantly improved capacity retention and reduced degradation over repeated operation. Post-mortem analyses including SEM, XPS, and EDS further confirmed that cells charged with the optimized protocols exhibited markedly less lithium deposition, thinner SEI layers, and more intact graphite morphology than those charged under CCCV. These results provide strong experimental validation that the proposed section-based BO framework not only reduces charging time but also extends cell lifetime by mitigating key degradation pathways. Overall, the study demonstrates the practical applicability of the optimized protocols for safe, efficient, and plating-free fast charging of large-format EV batteries under diverse thermal conditions. References Mateen S, Amir M, Haque A, Bakhsh FI. Ultra-fast charging of electric vehicles: a review of power electronics converter, grid stability and optimal battery consideration in multi-energy systems . Sustain Energy Grids 2023;35. Neubauer J, Pesaran A, Bae C, Elder R, Cunningham B. Updating United States Advanced Battery Consortium and Department of Energy battery technology targets for battery electric vehicles . J Power Sources. 2014;271:614-621 Yang XG, Wang CY. Understanding the trilemma of fast charging, energy density and cycle life of lithium-ion batteries . J Power Sources 2018;402:489– Lin XK, Khosravinia K, Hu XS, Li J, Lu W. Lithium plating mechanism, detection, and mitigation in lithium-ion batteries . Prog Energ Combust 2021;87. Jiang BB, Berliner MD, Lai K, Asinger PA, Zhao HB, Herring PK, Bazant MZ, Braatz RD. Fast charging design for Lithium-ion batteries via Bayesian optimization . Appl Energ 2022;307 Attia PM, Grover A, Jin N, Severson KA, Markov TM, Liao YH, Chen MH, Cheong B, Perkins N, Yang Z, Herring PK, Aykol M, Harris SJ, Braatz RD, Ermon S, Chueh WC. Closed-loop optimization of fast-charging protocols for batteries with machine learning . Nature 2020;578:397. +. Song XB, Jiang BB. Parallel Bayesian optimization using satisficing Thompson sampling for fast charging design of lithium-ion batteries . Eng Appl Artif Intell 2025;15 Doyle M, Newman J, Gozdz AS, Schmutz CN, Tarascon JM. Comparison of modeling predictions with experimental data from plastic lithium-ion cells. J Electro chem Soc 1996;143:1890– Arora P, Doyle M, Gozdz AS, White RE, Newman J. Comparison between computer simulations and experimental data for high-rate discharges of plastic lithium-ion batteries. J Power Sources 2000;88:219–3

  • Research Article
  • 10.3389/fpls.2025.1699124
Distributed multi-robot active gathering for non-uniform agriculture and forestry information
  • Oct 22, 2025
  • Frontiers in Plant Science
  • Jun Chen + 5 more

Active information gathering is a fundamental task in multi-robot systems in agriculture, with applications in precision planting and sowing, field management and inspection, intelligent weeding and pest control, etc. Traditional distributed strategies often struggle to adapt to environments where information of interest are unevenly clustered, leading to slow detection and inefficient coverage. In this paper, we reformulate the information gathering problem as a multi-armed bandit (MAB) problem and propose a novel distributed Bernoulli Thompson Sampling algorithm. Our approach enables robots to make exploration-exploitation decisions while sharing probabilistic information across the team, thus improving global coordination without centralized control. We further combine the distributed Bernoulli Thompson Sampling policy with Lloyd’s algorithm for dynamic target tracking and introduce a goal swapping strategy to improve task allocation efficiency. Extensive simulations demonstrate that our method significantly outperforms baseline approaches in terms of search speed and target coverage, particularly in scenarios with clustered target distributions.

  • Research Article
  • 10.1145/3771931
A Reward-Informed Semi-Personalized Bandit Approach for Enhancing Accuracy and Serendipity in Online Slate Recommendations
  • Oct 21, 2025
  • ACM Transactions on Recommender Systems
  • Lukas De Kerpel + 1 more

Contextual bandits provide a principled framework for personalization in online recommendation settings. However, as these methods tailor recommendation slates to an individual user, they tend to induce overspecialization, yielding homogeneous recommendation lists that limit exposure to diverse content and contribute to more systemic issues such as filter bubbles and echo chambers. To mitigate these effects, recommender systems must complement predictive accuracy with serendipity, providing recommendations that are novel and unexpected while remaining contextually relevant. This study proposes a semi-personalized bandit that, for each item, learns a decision tree to segment users by contextual features and reward patterns, and runs a unique Thompson Sampling policy for each user segment to create recommendation slates. By pooling information across behaviorally similar users and conducting the exploration mechanism at the user segment level, the framework mitigates overspecialization issues and promotes serendipitous recommendations. Moreover, the approach is inherently interpretable, with decision trees revealing decision pathways that define user segments, offering insights into recommendation logic. Experiments across three different online domains show that the semi-personalized framework reduces average regret relative to personalized baselines while improving serendipity in sparse interaction settings. These findings underscore the potential of semi-personalized bandits to improve recommendation quality in complex environments.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers