Discovery Logo
Sign In
Paper
Search Paper
Cancel
Pricing Sign In
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link

Related Topics

  • Reinforcement Learning Control
  • Reinforcement Learning Control
  • Deep Reinforcement Learning
  • Deep Reinforcement Learning
  • Reinforcement Learning Algorithm
  • Reinforcement Learning Algorithm
  • Multi-agent Reinforcement Learning
  • Multi-agent Reinforcement Learning
  • Reinforcement Learning
  • Reinforcement Learning

Articles published on Reinforcement Learning Methods

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
3547 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.14254/jsdtl.2026.11-1.01
Methodology for integrated inventory optimisation in production and trading enterprises: A systematic review and meta-analytic synthesis
  • Mar 3, 2026
  • Journal of Sustainable Development of Transport and Logistics
  • Tetiana Kashtalian

Purpose. This study aims to synthesise empirical and modelling evidence on inventory optimisation methods for raw materials, work-in-process, and finished goods in production and trading enterprises, and to translate that evidence into a practical, class-differentiated implementation framework deployable within standard warehouse management and enterprise resource planning systems. Methodology. A systematic review and meta-analytic synthesis of 31 peer-reviewed studies published between 2004 and 2025 was conducted following the PRISMA 2020 protocol. A random-effects model estimated by restricted maximum likelihood was applied to pool percentage cost-reduction effect sizes across 18 studies admissible to quantitative synthesis, complemented by a narrative synthesis of the remaining 13 studies. Pre-specified subgroup and moderator analyses examined the role of inventory class, demand pattern, and network complexity as effect-size moderators. Results. Distributional safety stock methods outperform classical normal approximations by a pooled mean of 9.3% (95% CI: 5.8–12.7%) at equivalent service levels, with the advantage being largest for high-variability SKU segments. Multi-echelon coordination yields a pooled mean cost reduction of 11.4% (95% CI: 6.9–15.9%), increasing significantly with network complexity and lead-time variability. Learning-based control methods deliver up to 16% cost reductions under complex network conditions but require substantial data and governance infrastructure. Commercial demand drivers systematically distort finished-goods inventory targets and require integration with sales-and-operations planning for accurate calibration. Theoretical contribution. The study provides the first cross-class synthesis covering raw materials, work-in-process, and finished goods within a unified evaluative framework, positioning machine learning and deep reinforcement learning methods alongside classical policy families and quantifying the boundary conditions for each approach. Practical implications. A six-phase, stepwise implementation framework is proposed, covering ABC-XYZ segmentation, forecast model selection, safety stock calibration, replenishment policy assignment, simulation-based parameter tuning, and KPI governance, enabling enterprises to achieve 9–16% reductions in inventory costs within existing WMS and ERP architectures. Sustainable Development Goals (SDGs): SDG 8: Decent Work and Economic Growth; SDG 9: Industry, Innovation and Infrastructure; SDG 12: Responsible Consumption and Production; SDG 17: Partnerships for the Goals

  • New
  • Research Article
  • Cite Count Icon 1
  • 10.1109/jiot.2025.3565325
GRWS: A Deep Reinforcement Learning Method With Graph Attention Networks for Flexible Workflow Scheduling in Industrial Manufacturing Scenarios
  • Mar 1, 2026
  • IEEE Internet of Things Journal
  • Yuzhe Huang + 4 more

GRWS: A Deep Reinforcement Learning Method With Graph Attention Networks for Flexible Workflow Scheduling in Industrial Manufacturing Scenarios

  • New
  • Research Article
  • 10.1016/j.neucom.2025.132578
LECMARL: A cooperative multi-agent reinforcement learning method based on lazy mechanisms and efficient exploration
  • Mar 1, 2026
  • Neurocomputing
  • Yukang Cao + 3 more

LECMARL: A cooperative multi-agent reinforcement learning method based on lazy mechanisms and efficient exploration

  • New
  • Research Article
  • 10.1016/j.engappai.2026.113761
A preference-based Reinforcement Learning method of maneuver decision-making in air combat
  • Mar 1, 2026
  • Engineering Applications of Artificial Intelligence
  • An Zhang + 3 more

A preference-based Reinforcement Learning method of maneuver decision-making in air combat

  • New
  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.segan.2025.102075
Optimal management of green hydrogen production in renewable energy systems using deep reinforcement learning methods
  • Mar 1, 2026
  • Sustainable Energy, Grids and Networks
  • Donguk Yang + 3 more

Optimal management of green hydrogen production in renewable energy systems using deep reinforcement learning methods

  • New
  • Research Article
  • 10.1016/j.compchemeng.2025.109515
A survey and tutorial of reinforcement learning methods in Process Systems Engineering
  • Mar 1, 2026
  • Computers & Chemical Engineering
  • Maximilian Bloor + 3 more

A survey and tutorial of reinforcement learning methods in Process Systems Engineering

  • New
  • Research Article
  • 10.3390/futuretransp6020056
Parameterized Reinforcement Learning with Route Guidance for Controlling Urban Road Traffic Networks
  • Feb 28, 2026
  • Future Transportation
  • Edwin M Kataka + 3 more

Traditional macroscopic fundamental diagram (MFD)-based traffic perimeter metering control strategies rely on full knowledge of vehicle accumulation and inter-regional flow dynamics, assumptions that seldom hold in heterogeneous and highly variable real-world networks. Classical data-driven reinforcement learning methods face similar constraints, often converging slowly and exhibiting low sample efficiency when confronted with such complexities. Motivated by these limitations, this paper proposes a Parameterized Deep Q-Network perimeter control (P-DQNPC) scheme designed for multi-region urban road networks. The framework jointly optimizes discrete actions (regional routing choices) and continuous actions (signal-timing or flow-duration regulation) within a model-free learning structure. The approach is first trained and validated on synthetic MFD data to establish stable and interpretable policy behavior under controlled conditions. It is then transferred and further evaluated using real-world measurements from the Performance Measurement System—San Francisco Bay Area (PeMS-SF), a dataset collected from 18,954 loop detectors across the California State Highway System. PeMS-SF is selected due to its high spatial and temporal resolution, broad network coverage, and strong ability to capture realistic and diverse congestion patterns qualities that support both rigorous validation and generalization to other metropolitan regions. Experimental results show that P-DQNPC consistently outperforms state-of-the-art baselines, including deep deterministic policy gradient, deep Q-network, and No-Control schemes. The proposed method achieves superior regulation of regional accumulations and demonstrates enhanced robustness in large, heterogeneous, and uncertain urban traffic environments.

  • New
  • Research Article
  • 10.3390/iot7010023
Edge AI for SD-IoT: A Systematic Review on Scalability and Latency
  • Feb 27, 2026
  • IoT
  • Ernando P Batista + 4 more

The growing demand for IoT applications in highly dynamic environments with multiple connected devices introduces significant scalability and low-latency challenges. In the context of software-defined networking (SDN) integrated with Edge environments, adopting machine learning (ML) techniques has emerged as a promising approach to meet these requirements. This study presents a Systematic Literature Review (SLR) that identifies and analyzes ML-based solutions applied to Software-Defined Internet of Things (SD-IoT) infrastructures in Edge environments, emphasizing improving low latency and scalability. Following established methodological best practices, we conducted the review, including a clear definition of research questions, well-defined inclusion and exclusion criteria, a structured search protocol, and multiple scientific databases. Based on the analysis of selected studies, the main strategies employed to enhance network performance are categorized, along with the level of fidelity and complexity of the experimental environments used, and the realism and applicability of the proposed solutions are discussed. Furthermore, drawing from the context of the selected studies, the most recurrent ML approaches are presented—including supervised, unsupervised, and reinforcement learning methods—along with a discussion of their advantages and limitations in dynamic network scenarios. By compiling and organizing the contributions from the literature, this paper provides a comprehensive overview of the state of the art in applying ML to SD-IoT networks, shedding light on current trends, existing gaps, and research opportunities aimed at building more intelligent and adaptable solutions for IoT environments.

  • New
  • Research Article
  • 10.1007/s12530-026-09799-w
Chaotic dung beetle optimization–enhanced multi-agent deep reinforcement learning for joint task offloading and resource allocation in multi-unmanned aerial vehicle internet of medical things networks
  • Feb 17, 2026
  • Evolving Systems
  • Gauri Kalnoor + 1 more

Abstract The rapid growth of connected medical devices generates massive volumes of heterogeneous health data that must be processed and transmitted in real time. In such environments, minimizing latency and energy consumption remains a critical challenge for next-generation health monitoring systems. Existing reinforcement learning and optimization methods for intelligent communication networks face several challenges, including slow convergence, high computational overhead, and inefficiency in handling task prioritization. To resolve these issues, this work develops a chaotic dung beetle optimization-boosted multi-agent deep reinforcement learning that jointly optimizes communication reliability, computational efficiency, and task prioritization. A reward function is designed to jointly minimize delay, energy usage, and system cost while preserving information freshness. Specifically, the dung beetle optimization process is combined with a piecewise linear chaotic map to enhance population diversity, which significantly improves search space exploration and leads to faster convergence and higher solution quality. The proposed algorithm enhances the exploration capability of multi-agent deep reinforcement learning through the integration of chaotic dung beetle optimization, enabling more accurate and reliable decision-making in real-world applications. Extensive experiments demonstrate that the proposed chaotic dung beetle optimization-boosted multi-agent deep reinforcement learning model achieves superior performance compared to baseline algorithms. Specifically, it reaches an accuracy of over 97.00% with rapid convergence, reduces system cost under varying health data sizes and Medical Internet of Things devices, and maintains robust scalability across diverse workloads. Moreover, the model achieves significant reductions in communication latency and energy consumption as central processing unit cycles and bandwidth increase, while effectively prioritizing high-criticality tasks.

  • New
  • Research Article
  • 10.1080/00295639.2025.2598170
Research Reactor Core Loading Optimization: Enabling Machine Learning Applications by Employing Surrogate Models
  • Feb 7, 2026
  • Nuclear Science and Engineering
  • Julia Bartos + 5 more

Recent advancements in machine learning (ML) algorithms and applications have made it possible for ML models to solve complex problems, such as reactor core loading optimization, which represents a multiobjective optimization problem with a high degree of freedom. This study aims to provide a proof of concept for an ML-based core loading optimization scheme aimed at research reactors. As a case study we selected the High Flux Reactor in Petten, the Netherlands. Two optimization algorithms are used in this study: genetic algorithm (GA) and reinforcement learning (RL). The goal is to increase the thermal neutron flux at specific locations in the reactor core while adhering to established safety constraints. The optimization schemes also utilized neural network–based surrogate models to substitute for the computationally intensive core calculations. The surrogate models are used to predict core parameters (such as the neutron flux, control rod position, and heat flux) for any given loading pattern. Our results show that ML-based core loading optimization has the potential to become a viable alternative to the traditional core optimization methods. Both the GA and RL methods were able to generate core loading patterns where the neutron flux was similar in most target locations to the results obtained with the traditional method.

  • New
  • Research Article
  • 10.1007/s00500-025-10997-y
A hierarchical deep reinforcement learning method for dragging and adjusting objects with dual-arm robot
  • Feb 7, 2026
  • Soft Computing
  • Saeed Saeedvand + 3 more

A hierarchical deep reinforcement learning method for dragging and adjusting objects with dual-arm robot

  • New
  • Research Article
  • 10.1142/s2301385027500701
A Hybrid Communication Method for UAV Swarms Based on ADG-MASAC
  • Feb 6, 2026
  • Unmanned Systems
  • Pingping Qu + 7 more

Efficient resource allocation for unmanned aerial vehicle (UAV) swarms is a critical challenge, complicated by severe interference between UAV-to-UAV (U2U) and UAV-to-infrastructure (U2I) communications. Traditional Multi-Agent Reinforcement Learning (MARL) methods often prove insufficient in this domain due to two fundamental limitations: the policy sacrifice phenomenon, wherein uncoordinated agent competition leads to suboptimal outcomes, and the curse of dimensionality, which impedes effective learning in large swarms. To address these limitations, this paper proposes the Attention-based and Dynamic Gateway Multi-Agent Soft Actor-Critic (ADG-MASAC), a novel MARL framework. Our approach integrates a dynamic gateway mechanism to convert chaotic competition into structured collaboration via dynamic role assignment and an attention-based critic to enable precise perception of high-dimensional global states. Experimental results demonstrate that ADG-MASAC not only resolves the policy sacrifice issue but also achieves substantial performance gains in both U2U and U2I communications. Ablation studies further confirm that the synergy between these two mechanisms is essential for the algorithm’s success.

  • Research Article
  • 10.3390/s26030965
MURM-A*: An Improved A* Within Comprehensive Path-Planning Scheme for Cellular-Connected Multi-UAVs Based on Radio Map and Complex Network
  • Feb 2, 2026
  • Sensors (Basel, Switzerland)
  • Yanming Chai + 4 more

For the purpose of fulfilling the dual requirements of persistent cellular network connectivity and flight safety for cellular-connected Unmanned Aerial Vehicles (UAVs) operating in dense urban airspace, this paper presents an A*-oriented comprehensive path-planning scheme for multiple connected UAVs that integrates a radio map and complex network. Existing research often lacks rigorous processing of environmental map data, while the traditional A* algorithm struggles to simultaneously handle constraints such as obstacle avoidance, flight maneuverability, and multi-UAV path conflicts. To overcome these limitations, this study first constructs a path-planning model based on complex-network theory using environmental data and the radio map, clarifying the separation of responsibilities between environment representation and algorithmic search. On this basis, we proposed an improved A* algorithm for multi-UAV scenarios termed MURM-A*. Simulation results demonstrate that the proposed algorithm effectively avoids collisions with obstacles, adheres to UAV flight dynamics, and prevents spatial conflicts between multi-UAV paths, while achieving a joint optimization between path efficiency and radio quality. In terms of performance comparison, the proposed algorithm shows a marginal difference but ensures operational validity compared to traditional A*, exhibits a slightly increase in flight time but achieves a substantial reduction in radio-outage time compared to the Deep Reinforcement Learning (DRL) method. Furthermore, employing the path-planning model enables the algorithm to more accurately identify environmental information compared to directly using raw environmental maps. The modeling time is also notably shorter than the training time required for DRL methods. This study provides a well-structured and extensible systematic framework for reliable path planning of multiple cellular-connected UAVs in complex radio environments.

  • Research Article
  • 10.3390/telecom7010015
Reconfigurable Wireless Channel Optimization and Low-Complexity Control Methods Driven by Intelligent Metasurfaces 2.0
  • Feb 2, 2026
  • Telecom
  • Xiaoguang Hu + 3 more

With the evolution of Reconfigurable Intelligent Surface (RIS) technology, its potential for dynamically optimizing wireless channels has garnered significant attention. However, existing methods still face challenges in real-time control in complex environments due to high computational complexity. To address this, this paper proposes a reconfigurable wireless channel optimization framework based on Intelligent Metasurfaces 2.0 and designs a low-complexity control strategy. The strategy integrates an adaptive adjustment mechanism and multi-dimensional feedback, aiming to reduce system computational load. Experimental results show that compared to traditional methods (such as MRC and MMSE), the proposed method improves signal transmission quality (SNR improvement of 3.8 dB) and system stability (exponential increase to 0.92). When compared to advanced deep reinforcement learning (DRL) and graph neural network (GNN) methods, it achieves similar signal quality while reducing computational overhead by 20.0% and energy consumption by approximately 32.4%. Ablation experiments further verify the effectiveness and synergistic role of the proposed core modules. This study provides a feasible approach toward high-efficiency, low-complexity dynamic channel optimization in 5G and future communication networks.

  • Research Article
  • 10.1007/s44230-026-00135-8
Dynamic Reward-Guided with Multi-Head Attention for Actor-Critic Policy Learning Optimization
  • Feb 2, 2026
  • Human-Centric Intelligent Systems
  • Xiaohui Huang + 3 more

Abstract In offline reinforcement learning, model-based approaches have demonstrated superior data efficiency by leveraging learned dynamics models to generate additional training samples. However, due to inevitable model inaccuracies, directly deriving policies from such models often leads to suboptimal performance under the constraints of the offline setting. Prior work has attempted to mitigate this issue by adopting conservative strategies that avoid reliance on out-of-distribution transitions. Nevertheless, these methods still face notable challenges, as dynamics models trained solely on historical data typically struggle to generalize to unseen state-action pairs. In this paper, we propose a novel offline reinforcement learning method Dynamic Reward-Guided Multi-Head Attention for Actor-Critic Policy Learning Optimization (DRMAAC). DRMAAC introduces a dynamic-aware paradigm that focuses on capturing the intrinsic characteristics of the behavior policy. It leverages inverse reinforcement learning to recover a reward-consistent dynamics model and identify high-return states. Meanwhile, an Actor-Critic architecture enhanced with multi-head attention makes decisions guided by these high-value states. This integration enables the model to better capture long-term dependencies and prioritize informative features in complex state spaces. Empirical evaluations on the D4RL benchmark show that DRMAAC consistently outperforms previous state-of-the-art methods across a variety of tasks. These results highlight not only improved data efficiency but also strong generalization capabilities under diverse environmental conditions. Overall, DRMAAC presents a promising direction for advancing model-based offline reinforcement learning by combining attention mechanisms with reward-consistent dynamics modeling.

  • Research Article
  • 10.1007/s11071-025-12095-y
Event-triggered optimal consensus for discrete-time nonlinear multiagent systems with DoS attacks via reinforcement learning method
  • Feb 1, 2026
  • Nonlinear Dynamics
  • Yujie Liao + 3 more

Event-triggered optimal consensus for discrete-time nonlinear multiagent systems with DoS attacks via reinforcement learning method

  • Research Article
  • 10.1016/j.hcl.2025.08.002
Using Tree-Based Reinforcement Learning Methods to Support Personalized Decision-Making in Hand Treatment.
  • Feb 1, 2026
  • Hand clinics
  • Yao Song + 1 more

Using Tree-Based Reinforcement Learning Methods to Support Personalized Decision-Making in Hand Treatment.

  • Research Article
  • 10.1016/j.asr.2025.11.107
Hybrid deep reinforcement learning and indirect method for low-thrust trajectory optimization in cislunar space
  • Feb 1, 2026
  • Advances in Space Research
  • Izhar Ul Haq + 3 more

Hybrid deep reinforcement learning and indirect method for low-thrust trajectory optimization in cislunar space

  • Research Article
  • 10.1016/j.cor.2026.107426
Multi-Attribute Utility Deep Reinforcement Learning method for sequential multi-criteria decision problems: Application to human resource planning
  • Feb 1, 2026
  • Computers & Operations Research
  • Mohammadreza Nematollahi + 4 more

Multi-Attribute Utility Deep Reinforcement Learning method for sequential multi-criteria decision problems: Application to human resource planning

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.trc.2025.105453
Learning to reschedule platforms: A graph neural network based deep reinforcement learning method for the train platforming and rescheduling problem
  • Feb 1, 2026
  • Transportation Research Part C: Emerging Technologies
  • Hongxiang Zhang + 5 more

Learning to reschedule platforms: A graph neural network based deep reinforcement learning method for the train platforming and rescheduling problem

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers