Related Topics
Articles published on Reinforcement Learning Methods
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
3547 Search results
Sort by Recency
- New
- Research Article
- 10.14254/jsdtl.2026.11-1.01
- Mar 3, 2026
- Journal of Sustainable Development of Transport and Logistics
- Tetiana Kashtalian
Purpose. This study aims to synthesise empirical and modelling evidence on inventory optimisation methods for raw materials, work-in-process, and finished goods in production and trading enterprises, and to translate that evidence into a practical, class-differentiated implementation framework deployable within standard warehouse management and enterprise resource planning systems. Methodology. A systematic review and meta-analytic synthesis of 31 peer-reviewed studies published between 2004 and 2025 was conducted following the PRISMA 2020 protocol. A random-effects model estimated by restricted maximum likelihood was applied to pool percentage cost-reduction effect sizes across 18 studies admissible to quantitative synthesis, complemented by a narrative synthesis of the remaining 13 studies. Pre-specified subgroup and moderator analyses examined the role of inventory class, demand pattern, and network complexity as effect-size moderators. Results. Distributional safety stock methods outperform classical normal approximations by a pooled mean of 9.3% (95% CI: 5.8–12.7%) at equivalent service levels, with the advantage being largest for high-variability SKU segments. Multi-echelon coordination yields a pooled mean cost reduction of 11.4% (95% CI: 6.9–15.9%), increasing significantly with network complexity and lead-time variability. Learning-based control methods deliver up to 16% cost reductions under complex network conditions but require substantial data and governance infrastructure. Commercial demand drivers systematically distort finished-goods inventory targets and require integration with sales-and-operations planning for accurate calibration. Theoretical contribution. The study provides the first cross-class synthesis covering raw materials, work-in-process, and finished goods within a unified evaluative framework, positioning machine learning and deep reinforcement learning methods alongside classical policy families and quantifying the boundary conditions for each approach. Practical implications. A six-phase, stepwise implementation framework is proposed, covering ABC-XYZ segmentation, forecast model selection, safety stock calibration, replenishment policy assignment, simulation-based parameter tuning, and KPI governance, enabling enterprises to achieve 9–16% reductions in inventory costs within existing WMS and ERP architectures. Sustainable Development Goals (SDGs): SDG 8: Decent Work and Economic Growth; SDG 9: Industry, Innovation and Infrastructure; SDG 12: Responsible Consumption and Production; SDG 17: Partnerships for the Goals
- New
- Research Article
1
- 10.1109/jiot.2025.3565325
- Mar 1, 2026
- IEEE Internet of Things Journal
- Yuzhe Huang + 4 more
GRWS: A Deep Reinforcement Learning Method With Graph Attention Networks for Flexible Workflow Scheduling in Industrial Manufacturing Scenarios
- New
- Research Article
- 10.1016/j.neucom.2025.132578
- Mar 1, 2026
- Neurocomputing
- Yukang Cao + 3 more
LECMARL: A cooperative multi-agent reinforcement learning method based on lazy mechanisms and efficient exploration
- New
- Research Article
- 10.1016/j.engappai.2026.113761
- Mar 1, 2026
- Engineering Applications of Artificial Intelligence
- An Zhang + 3 more
A preference-based Reinforcement Learning method of maneuver decision-making in air combat
- New
- Research Article
1
- 10.1016/j.segan.2025.102075
- Mar 1, 2026
- Sustainable Energy, Grids and Networks
- Donguk Yang + 3 more
Optimal management of green hydrogen production in renewable energy systems using deep reinforcement learning methods
- New
- Research Article
- 10.1016/j.compchemeng.2025.109515
- Mar 1, 2026
- Computers & Chemical Engineering
- Maximilian Bloor + 3 more
A survey and tutorial of reinforcement learning methods in Process Systems Engineering
- New
- Research Article
- 10.3390/futuretransp6020056
- Feb 28, 2026
- Future Transportation
- Edwin M Kataka + 3 more
Traditional macroscopic fundamental diagram (MFD)-based traffic perimeter metering control strategies rely on full knowledge of vehicle accumulation and inter-regional flow dynamics, assumptions that seldom hold in heterogeneous and highly variable real-world networks. Classical data-driven reinforcement learning methods face similar constraints, often converging slowly and exhibiting low sample efficiency when confronted with such complexities. Motivated by these limitations, this paper proposes a Parameterized Deep Q-Network perimeter control (P-DQNPC) scheme designed for multi-region urban road networks. The framework jointly optimizes discrete actions (regional routing choices) and continuous actions (signal-timing or flow-duration regulation) within a model-free learning structure. The approach is first trained and validated on synthetic MFD data to establish stable and interpretable policy behavior under controlled conditions. It is then transferred and further evaluated using real-world measurements from the Performance Measurement System—San Francisco Bay Area (PeMS-SF), a dataset collected from 18,954 loop detectors across the California State Highway System. PeMS-SF is selected due to its high spatial and temporal resolution, broad network coverage, and strong ability to capture realistic and diverse congestion patterns qualities that support both rigorous validation and generalization to other metropolitan regions. Experimental results show that P-DQNPC consistently outperforms state-of-the-art baselines, including deep deterministic policy gradient, deep Q-network, and No-Control schemes. The proposed method achieves superior regulation of regional accumulations and demonstrates enhanced robustness in large, heterogeneous, and uncertain urban traffic environments.
- New
- Research Article
- 10.3390/iot7010023
- Feb 27, 2026
- IoT
- Ernando P Batista + 4 more
The growing demand for IoT applications in highly dynamic environments with multiple connected devices introduces significant scalability and low-latency challenges. In the context of software-defined networking (SDN) integrated with Edge environments, adopting machine learning (ML) techniques has emerged as a promising approach to meet these requirements. This study presents a Systematic Literature Review (SLR) that identifies and analyzes ML-based solutions applied to Software-Defined Internet of Things (SD-IoT) infrastructures in Edge environments, emphasizing improving low latency and scalability. Following established methodological best practices, we conducted the review, including a clear definition of research questions, well-defined inclusion and exclusion criteria, a structured search protocol, and multiple scientific databases. Based on the analysis of selected studies, the main strategies employed to enhance network performance are categorized, along with the level of fidelity and complexity of the experimental environments used, and the realism and applicability of the proposed solutions are discussed. Furthermore, drawing from the context of the selected studies, the most recurrent ML approaches are presented—including supervised, unsupervised, and reinforcement learning methods—along with a discussion of their advantages and limitations in dynamic network scenarios. By compiling and organizing the contributions from the literature, this paper provides a comprehensive overview of the state of the art in applying ML to SD-IoT networks, shedding light on current trends, existing gaps, and research opportunities aimed at building more intelligent and adaptable solutions for IoT environments.
- New
- Research Article
- 10.1007/s12530-026-09799-w
- Feb 17, 2026
- Evolving Systems
- Gauri Kalnoor + 1 more
Abstract The rapid growth of connected medical devices generates massive volumes of heterogeneous health data that must be processed and transmitted in real time. In such environments, minimizing latency and energy consumption remains a critical challenge for next-generation health monitoring systems. Existing reinforcement learning and optimization methods for intelligent communication networks face several challenges, including slow convergence, high computational overhead, and inefficiency in handling task prioritization. To resolve these issues, this work develops a chaotic dung beetle optimization-boosted multi-agent deep reinforcement learning that jointly optimizes communication reliability, computational efficiency, and task prioritization. A reward function is designed to jointly minimize delay, energy usage, and system cost while preserving information freshness. Specifically, the dung beetle optimization process is combined with a piecewise linear chaotic map to enhance population diversity, which significantly improves search space exploration and leads to faster convergence and higher solution quality. The proposed algorithm enhances the exploration capability of multi-agent deep reinforcement learning through the integration of chaotic dung beetle optimization, enabling more accurate and reliable decision-making in real-world applications. Extensive experiments demonstrate that the proposed chaotic dung beetle optimization-boosted multi-agent deep reinforcement learning model achieves superior performance compared to baseline algorithms. Specifically, it reaches an accuracy of over 97.00% with rapid convergence, reduces system cost under varying health data sizes and Medical Internet of Things devices, and maintains robust scalability across diverse workloads. Moreover, the model achieves significant reductions in communication latency and energy consumption as central processing unit cycles and bandwidth increase, while effectively prioritizing high-criticality tasks.
- New
- Research Article
- 10.1080/00295639.2025.2598170
- Feb 7, 2026
- Nuclear Science and Engineering
- Julia Bartos + 5 more
Recent advancements in machine learning (ML) algorithms and applications have made it possible for ML models to solve complex problems, such as reactor core loading optimization, which represents a multiobjective optimization problem with a high degree of freedom. This study aims to provide a proof of concept for an ML-based core loading optimization scheme aimed at research reactors. As a case study we selected the High Flux Reactor in Petten, the Netherlands. Two optimization algorithms are used in this study: genetic algorithm (GA) and reinforcement learning (RL). The goal is to increase the thermal neutron flux at specific locations in the reactor core while adhering to established safety constraints. The optimization schemes also utilized neural network–based surrogate models to substitute for the computationally intensive core calculations. The surrogate models are used to predict core parameters (such as the neutron flux, control rod position, and heat flux) for any given loading pattern. Our results show that ML-based core loading optimization has the potential to become a viable alternative to the traditional core optimization methods. Both the GA and RL methods were able to generate core loading patterns where the neutron flux was similar in most target locations to the results obtained with the traditional method.
- New
- Research Article
- 10.1007/s00500-025-10997-y
- Feb 7, 2026
- Soft Computing
- Saeed Saeedvand + 3 more
A hierarchical deep reinforcement learning method for dragging and adjusting objects with dual-arm robot
- New
- Research Article
- 10.1142/s2301385027500701
- Feb 6, 2026
- Unmanned Systems
- Pingping Qu + 7 more
Efficient resource allocation for unmanned aerial vehicle (UAV) swarms is a critical challenge, complicated by severe interference between UAV-to-UAV (U2U) and UAV-to-infrastructure (U2I) communications. Traditional Multi-Agent Reinforcement Learning (MARL) methods often prove insufficient in this domain due to two fundamental limitations: the policy sacrifice phenomenon, wherein uncoordinated agent competition leads to suboptimal outcomes, and the curse of dimensionality, which impedes effective learning in large swarms. To address these limitations, this paper proposes the Attention-based and Dynamic Gateway Multi-Agent Soft Actor-Critic (ADG-MASAC), a novel MARL framework. Our approach integrates a dynamic gateway mechanism to convert chaotic competition into structured collaboration via dynamic role assignment and an attention-based critic to enable precise perception of high-dimensional global states. Experimental results demonstrate that ADG-MASAC not only resolves the policy sacrifice issue but also achieves substantial performance gains in both U2U and U2I communications. Ablation studies further confirm that the synergy between these two mechanisms is essential for the algorithm’s success.
- Research Article
- 10.3390/s26030965
- Feb 2, 2026
- Sensors (Basel, Switzerland)
- Yanming Chai + 4 more
For the purpose of fulfilling the dual requirements of persistent cellular network connectivity and flight safety for cellular-connected Unmanned Aerial Vehicles (UAVs) operating in dense urban airspace, this paper presents an A*-oriented comprehensive path-planning scheme for multiple connected UAVs that integrates a radio map and complex network. Existing research often lacks rigorous processing of environmental map data, while the traditional A* algorithm struggles to simultaneously handle constraints such as obstacle avoidance, flight maneuverability, and multi-UAV path conflicts. To overcome these limitations, this study first constructs a path-planning model based on complex-network theory using environmental data and the radio map, clarifying the separation of responsibilities between environment representation and algorithmic search. On this basis, we proposed an improved A* algorithm for multi-UAV scenarios termed MURM-A*. Simulation results demonstrate that the proposed algorithm effectively avoids collisions with obstacles, adheres to UAV flight dynamics, and prevents spatial conflicts between multi-UAV paths, while achieving a joint optimization between path efficiency and radio quality. In terms of performance comparison, the proposed algorithm shows a marginal difference but ensures operational validity compared to traditional A*, exhibits a slightly increase in flight time but achieves a substantial reduction in radio-outage time compared to the Deep Reinforcement Learning (DRL) method. Furthermore, employing the path-planning model enables the algorithm to more accurately identify environmental information compared to directly using raw environmental maps. The modeling time is also notably shorter than the training time required for DRL methods. This study provides a well-structured and extensible systematic framework for reliable path planning of multiple cellular-connected UAVs in complex radio environments.
- Research Article
- 10.3390/telecom7010015
- Feb 2, 2026
- Telecom
- Xiaoguang Hu + 3 more
With the evolution of Reconfigurable Intelligent Surface (RIS) technology, its potential for dynamically optimizing wireless channels has garnered significant attention. However, existing methods still face challenges in real-time control in complex environments due to high computational complexity. To address this, this paper proposes a reconfigurable wireless channel optimization framework based on Intelligent Metasurfaces 2.0 and designs a low-complexity control strategy. The strategy integrates an adaptive adjustment mechanism and multi-dimensional feedback, aiming to reduce system computational load. Experimental results show that compared to traditional methods (such as MRC and MMSE), the proposed method improves signal transmission quality (SNR improvement of 3.8 dB) and system stability (exponential increase to 0.92). When compared to advanced deep reinforcement learning (DRL) and graph neural network (GNN) methods, it achieves similar signal quality while reducing computational overhead by 20.0% and energy consumption by approximately 32.4%. Ablation experiments further verify the effectiveness and synergistic role of the proposed core modules. This study provides a feasible approach toward high-efficiency, low-complexity dynamic channel optimization in 5G and future communication networks.
- Research Article
- 10.1007/s44230-026-00135-8
- Feb 2, 2026
- Human-Centric Intelligent Systems
- Xiaohui Huang + 3 more
Abstract In offline reinforcement learning, model-based approaches have demonstrated superior data efficiency by leveraging learned dynamics models to generate additional training samples. However, due to inevitable model inaccuracies, directly deriving policies from such models often leads to suboptimal performance under the constraints of the offline setting. Prior work has attempted to mitigate this issue by adopting conservative strategies that avoid reliance on out-of-distribution transitions. Nevertheless, these methods still face notable challenges, as dynamics models trained solely on historical data typically struggle to generalize to unseen state-action pairs. In this paper, we propose a novel offline reinforcement learning method Dynamic Reward-Guided Multi-Head Attention for Actor-Critic Policy Learning Optimization (DRMAAC). DRMAAC introduces a dynamic-aware paradigm that focuses on capturing the intrinsic characteristics of the behavior policy. It leverages inverse reinforcement learning to recover a reward-consistent dynamics model and identify high-return states. Meanwhile, an Actor-Critic architecture enhanced with multi-head attention makes decisions guided by these high-value states. This integration enables the model to better capture long-term dependencies and prioritize informative features in complex state spaces. Empirical evaluations on the D4RL benchmark show that DRMAAC consistently outperforms previous state-of-the-art methods across a variety of tasks. These results highlight not only improved data efficiency but also strong generalization capabilities under diverse environmental conditions. Overall, DRMAAC presents a promising direction for advancing model-based offline reinforcement learning by combining attention mechanisms with reward-consistent dynamics modeling.
- Research Article
- 10.1007/s11071-025-12095-y
- Feb 1, 2026
- Nonlinear Dynamics
- Yujie Liao + 3 more
Event-triggered optimal consensus for discrete-time nonlinear multiagent systems with DoS attacks via reinforcement learning method
- Research Article
- 10.1016/j.hcl.2025.08.002
- Feb 1, 2026
- Hand clinics
- Yao Song + 1 more
Using Tree-Based Reinforcement Learning Methods to Support Personalized Decision-Making in Hand Treatment.
- Research Article
- 10.1016/j.asr.2025.11.107
- Feb 1, 2026
- Advances in Space Research
- Izhar Ul Haq + 3 more
Hybrid deep reinforcement learning and indirect method for low-thrust trajectory optimization in cislunar space
- Research Article
- 10.1016/j.cor.2026.107426
- Feb 1, 2026
- Computers & Operations Research
- Mohammadreza Nematollahi + 4 more
Multi-Attribute Utility Deep Reinforcement Learning method for sequential multi-criteria decision problems: Application to human resource planning
- Research Article
1
- 10.1016/j.trc.2025.105453
- Feb 1, 2026
- Transportation Research Part C: Emerging Technologies
- Hongxiang Zhang + 5 more
Learning to reschedule platforms: A graph neural network based deep reinforcement learning method for the train platforming and rescheduling problem