Articles published on Deep Reinforcement Learning
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
16012 Search results
Sort by Recency
- New
- Research Article
- 10.1142/s2424786326500167
- Mar 11, 2026
- International Journal of Financial Engineering
- Mukhazar Ahmad Khan + 2 more
In this study, attempts are made to utilize the deep reinforcement learning (DRL)-based models for predicting stock market and investment strategy optimization. We compare three DRL architectures, Deep Q-Network (DQN), Double Deep Q-Network (Double DQN) and Dueling Deep Q-Network (Dueling DQN), on the Pakistan Stock Exchange. We adopt an empirical approach from a set of 30 stocks in combination with descriptive statistical analysis to evaluate representativeness. The results indicate that the three kinds of models can all earn positive profits and Double DQN maintains the most average profits in both training dataset and test dataset. We then attempt to mitigate the potential overfit and analyze implications for trading strategies. In addition, we consider personality and nonconvex heuristic money and risk management in a mathematical agent-based model to study the foraging behavior of a population of interacting, moving agents in an assigned background.
- New
- Research Article
- 10.1016/j.aap.2026.108496
- Mar 10, 2026
- Accident; analysis and prevention
- Hassan Bin Tahir + 1 more
A deep reinforcement learning algorithm for optimizing safety and efficiency of traffic signals using traffic conflict technique and artificial intelligence-based video analytics.
- New
- Research Article
- 10.3390/ijgi15030114
- Mar 9, 2026
- ISPRS International Journal of Geo-Information
- Yuxuan Hu + 2 more
Urban commercial restructuring, driven by the closure of traditional supermarkets and the expansion of new-format superstores, creates a large-scale spatial reallocation challenge requiring scientific location-allocation methods. Traditional heuristic algorithms such as Genetic Algorithm (GA) struggle with discrete spatial optimization under 400+ candidate sites and complex geographic mask constraints: they converge slowly and easily fall into local optima. This study proposes a Deep Reinforcement Learning (DRL) framework named GeoPPO (Geospatial Proximal Policy Optimization) to address this gap. Using Xi’an’s retail restructuring as a case setting—427 candidate locations and multidimensional geographic features—the approach models spatial constraints via a gridded environment encoded as a five-channel state tensor. Key innovations include a dynamic action-constraint mechanism that masks invalid actions based on boundary rules and competition avoidance, and a curriculum learning strategy that enables stable convergence. The framework fills the need for methods that handle hard spatial constraints in large-scale location-allocation. Tests demonstrate rapid convergence within 1,000 epochs, achieving 75% average demand coverage—2.7% and 5.5% higher than GA and Particle Swarm Optimization (PSO), respectively. Ablation experiments confirm that Vanilla PPO without dynamic action masking fails to produce feasible solutions. The framework offers a feasible technical path for handling highly dynamic urban facility spatial configuration with geographic mask constraints.
- New
- Research Article
- 10.1038/s41598-026-40508-4
- Mar 9, 2026
- Scientific reports
- Amirhossein Ghaemipour + 2 more
Distribution network reconfiguration (DNR) is one of the most widely employed methods for minimizing distribution network power losses with minimal investment. Most methods used for DNR are based on an accurate model. However, this study employs deep reinforcement learning (DRL), a model-free approach. This study proposes a loop-based method for effectively managing the large action space, which simultaneously utilizes the modified Q-learning algorithm to account for inter-loop coupling effects. Additionally, an innovative replay method is utilized to enhance convergence speed. Our approach has been tested on the IEEE 33-, 69-, and 119-bus distribution networks. The simulation results indicate that the proposed approach achieves significant superiority compared to DNR's previous methods, such as metaheuristic and mathematical techniques, in terms of computational time, as well as final distribution network power loss and voltage deviation.
- New
- Research Article
- 10.1038/s41598-026-42953-7
- Mar 9, 2026
- Scientific reports
- Hassan Fareed M Lahza + 6 more
Deep reinforcement learning for network resource optimization in MIMO-NOMA networks to maximize utilization with minimal overhead.
- New
- Research Article
- 10.3390/s26051704
- Mar 8, 2026
- Sensors
- Lili Yin + 3 more
In the context of smart manufacturing, with the widespread deployment of Industrial Internet of Things (IoT) devices, a large number of computation tasks that are highly sensitive to latency and have strict deadlines have emerged, requiring real-time processing. Effectively offloading tasks to address the issues of increased latency and task dropouts caused by dynamic changes in edge node load has become a key challenge in the cloud–edge–end collaborative environment of smart manufacturing. To tackle the complex issues of unknown edge node loads and dynamic system state changes, this paper proposes a distributed algorithm based on deep reinforcement learning, utilizing convolutional neural networks (CNN) and the Informer architecture. The proposed algorithm leverages CNN to extract local features of edge node loads while utilizing Informer’s self-attention mechanism to capture long-term load variation trends, thereby effectively handling the uncertainty and dynamics inherent in node loads. Furthermore, by integrating the Dueling Deep Q-Network (DQN) and Double DQN techniques, the algorithm achieves a precise approximation of the state–action value function, further enhancing its capability to perceive system temporal characteristics and adapt to heterogeneous tasks. Each mobile device can independently make task offloading decisions and scheduling strategies based on its observations, enabling dynamic task allocation and optimization of execution order. Simulation results show that, compared to various existing algorithms, the proposed method reduces task dropout rates by 82.3–94% and average latency by 28–39.2%. Experimental results validate the significant advantages of this method in intelligent manufacturing scenarios with high load and latency-sensitive tasks.
- New
- Research Article
- 10.1080/19393555.2026.2632663
- Mar 7, 2026
- Information Security Journal: A Global Perspective
- Syed Hussain + 5 more
ABSTRACT Zero-day attacks exploit vulnerabilities that have not been detected before and therefore do not pose a problem for existing signature-based intrusion detection systems, as they lack established attack patterns. Although machine learning-based detection has improved, current methods are not flexible enough to handle new threats and have high false-positive rates. In this paper, a Deep Reinforcement Learning (DRL) framework is proposed to design a zero-day attack detection system as a Markov Decision Process (MDP) that supports adaptive learning without using any attack signatures. We apply and compare three DRL algorithms, Deep Q-Network (DQN), Proximal Policy Optimization (PPO), and Advantage Actor-Critic (A2C), with a new feature engineering method that combines Principal Component Analysis with Information Gain selection. The framework is tested on various benchmark datasets (NSL-KDD, CICIDS2017, CIC-AndMal2017) and a specially created dataset of zero-day attacks in the context of the present research. Most experimental findings show that the DQN model attains 91.7% accuracy and 83.4% detection rate over previously unseen attacks, 14.7% and 8.9% better than traditional machine learning and deep learning baselines, respectively. A study of ablation indicates that the exploration strategy plays a critical role in zero-day detection, with its removal resulting in a 10.2% reduction in detection rates. The suggested framework offers greater flexibility against different types of attacks while maintaining a lower false-positive rate (8.2%) than traditional methods. The work contributes to the development of cybersecurity defense functions by demonstrating that DRL is a useful paradigm for detecting unknown threats in dynamic network settings.
- New
- Research Article
- 10.3390/app16052576
- Mar 7, 2026
- Applied Sciences
- Le Dinh Nghiem + 3 more
Optimizing traffic performance using artificial intelligence (AI) has consistently been a prominent direction in the development of intelligent transportation systems. While numerous studies have proposed methodologies for integrating cooperative connected and autonomous vehicles (CCAVs) with traffic signal systems via V2X communication, they often rely on simplified control strategies or lack effective coordination between signal timing and vehicle behavior. In this study, we propose a novel, integrated traffic signal control strategy combined with CAVs using deep reinforcement learning. Our key differentiation lies in the simultaneous optimization of signal phases using the Soft Actor–Critic (SAC) algorithm and the regulation of CCAVs via cooperative adaptive cruise control and Green Light Optimal Speed Advisory. This dual approach allows the signal controller to leverage rich state information from CAVs and the road infrastructure, enabling more anticipatory and cooperative decisions. The proposed approach is implemented and evaluated through various scenarios using the Simulation of Urban MObility (SUMO) platform. The results demonstrate the superior learning performance and robustness of the proposed model. Specifically, our proposed model achieves a significant reduction in average vehicle waiting time by up to over 80% compared to baseline models under high-demand scenarios (4800–6000 veh/h). These findings underscore the critical importance of joint optimization in future intelligent transportation systems, paving the way for more resilient urban traffic management.
- New
- Research Article
- 10.1177/09544070261423594
- Mar 4, 2026
- Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering
- Bo Yang + 2 more
To address the motion sickness comfort issues of occupants caused by the path-tracking accuracy and stability control of autonomous vehicles under varying steering conditions, a multi-objective control strategy is proposed based on a deep reinforcement learning approach for adaptive parameter tuning of a linear quadratic regulator (LQR) controller. First, an integrated vehicle dynamics model and a tracking error model with additional yaw moment were established. On this basis, a human–vehicle coupled motion sickness comfort model was developed by combining the six-degree-of-freedom subjective vertical conflict (6DOF-SVC) model with the Toyota finite element human body model. Second, an LQR controller was designed based on reinforcement learning. With the integration of a deep Q-network (DQN) agent, a reward function considering occupant motion sickness comfort, stability, and path-tracking accuracy was constructed to guide the agent in achieving trade-offs among multiple performance objectives. Finally, joint simulations using CarSim and Simulink were conducted under two varying operating conditions, namely, variable road curvature and variable road adhesion coefficients. The simulation results demonstrate that, under two variable operating conditions, the proposed strategy achieves notable improvements compared with the LQR parameter tuning control strategies based on genetic algorithms and fuzzy algorithms. In terms of tracking accuracy, the maximum lateral deviation is reduced by 39%, 31%, and 41%, 19%, while the average lateral deviation decreases by 20%, 18%, and 18%, 10%, respectively. Regarding lateral stability, the maximum sideslip angle is reduced by 27%, 16%, and 28%, 13%, whereas the maximum front wheel steering angle decreases by 34%, 27%, and 47%, 26%, respectively. In terms of ride comfort, the incidence rate of motion sickness among passengers is reduced by 34%, 19%, and 33%, 20%, respectively. Finally, the real-vehicle experiments further validate the effectiveness of the proposed deep reinforcement learning–based LQR controller parameter tuning strategy.
- New
- Research Article
- 10.3390/ijfs14030067
- Mar 4, 2026
- International Journal of Financial Studies
- Yu-Heng Hsieh + 2 more
This study develops a novel AI-based trading framework designed to consistently generate profits across cyclical bullish and bearish futures markets. Unlike conventional strategies that rely on static rules or a single predictive model, the proposed framework introduces a dual-agent deep reinforcement learning (DRL) architecture, where one agent specializes in bullish conditions and the other in bearish conditions, while a trading decision selector dynamically predicts market regimes and allocates execution accordingly. This design enables the system to adapt to regime shifts and mitigate risks arising from market volatility and extreme events. Using Mini Taiwan Stock Exchange Index Futures (MTX) as a case study, a four-year historical backtest is conducted covering multiple disruptive periods, including the tax adjustment and the Russia–Ukraine conflict. The empirical results show that, under a monthly capital reset and loss-compensation rule with a fixed investment of TWD 500,000 per month, the proposed framework achieves an average cumulative return of 2240%, an annualized return of 109%, and a Sharpe ratio of 0.31, with the cumulative ROI exceeding twice the MTX index growth over the same period. Although the Sharpe ratio remains moderate, this outcome reflects the framework’s emphasis on directional trading and absolute return maximization, where profitable trades outweigh intermittent losses despite higher short-term volatility. These findings suggest that adaptive, regime-aware DRL architectures are particularly effective for futures trading in markets characterized by frequent trend reversals, offering both methodological innovation and practical applicability under realistic market conditions, with strong returns achieved at a moderate risk-adjusted level.
- New
- Research Article
- 10.14254/jsdtl.2026.11-1.01
- Mar 3, 2026
- Journal of Sustainable Development of Transport and Logistics
- Tetiana Kashtalian
Purpose. This study aims to synthesise empirical and modelling evidence on inventory optimisation methods for raw materials, work-in-process, and finished goods in production and trading enterprises, and to translate that evidence into a practical, class-differentiated implementation framework deployable within standard warehouse management and enterprise resource planning systems. Methodology. A systematic review and meta-analytic synthesis of 31 peer-reviewed studies published between 2004 and 2025 was conducted following the PRISMA 2020 protocol. A random-effects model estimated by restricted maximum likelihood was applied to pool percentage cost-reduction effect sizes across 18 studies admissible to quantitative synthesis, complemented by a narrative synthesis of the remaining 13 studies. Pre-specified subgroup and moderator analyses examined the role of inventory class, demand pattern, and network complexity as effect-size moderators. Results. Distributional safety stock methods outperform classical normal approximations by a pooled mean of 9.3% (95% CI: 5.8–12.7%) at equivalent service levels, with the advantage being largest for high-variability SKU segments. Multi-echelon coordination yields a pooled mean cost reduction of 11.4% (95% CI: 6.9–15.9%), increasing significantly with network complexity and lead-time variability. Learning-based control methods deliver up to 16% cost reductions under complex network conditions but require substantial data and governance infrastructure. Commercial demand drivers systematically distort finished-goods inventory targets and require integration with sales-and-operations planning for accurate calibration. Theoretical contribution. The study provides the first cross-class synthesis covering raw materials, work-in-process, and finished goods within a unified evaluative framework, positioning machine learning and deep reinforcement learning methods alongside classical policy families and quantifying the boundary conditions for each approach. Practical implications. A six-phase, stepwise implementation framework is proposed, covering ABC-XYZ segmentation, forecast model selection, safety stock calibration, replenishment policy assignment, simulation-based parameter tuning, and KPI governance, enabling enterprises to achieve 9–16% reductions in inventory costs within existing WMS and ERP architectures. Sustainable Development Goals (SDGs): SDG 8: Decent Work and Economic Growth; SDG 9: Industry, Innovation and Infrastructure; SDG 12: Responsible Consumption and Production; SDG 17: Partnerships for the Goals
- New
- Research Article
- 10.1016/j.envres.2026.123795
- Mar 1, 2026
- Environmental research
- Yunshu Bai + 8 more
Optimizing wastewater treatment through combined deep learning and deep reinforcement learning: Recent advances and future prospects.
- New
- Research Article
- 10.1016/j.neunet.2025.108264
- Mar 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Jing Hu + 9 more
RL-I2IT: Image-to-image translation with deep reinforcement learning.
- New
- Research Article
- 10.1088/2631-8695/ae49e7
- Mar 1, 2026
- Engineering Research Express
- Hongxu Li
Abstract To break through the storage and computing bottlenecks of traditional centralized intrusion detection models in the context of massive network traffic and to improve the identification performance and real-time performance for complex intrusion behaviors, this study designs a distributed Network Intrusion Detection (NID) model based on blockchain and Deep Reinforcement Learning (DRL). By performing requirement analysis, the study first clarifies the core functions, which are system user management, intrusion detection, data transmission, and display, respectively. It builds a distributed architecture on the basis of platforms comprising Centos7 and Hadoop 2.8.0 via multiple servers and completes system development utilizing the development tool Visual C++6.0. The study constructs a full-process model, which consists of five modules: data collection, preprocessing, storage, intrusion detection, and execution. Flume is adopted to collect network logs and traffic data, secure storage of the full-process data is realized via Hadoop Distributed File System (HDFS), and intelligent identification and feedback of intrusion behaviors are completed depending on DRL. Finally, the KDD99 dataset is applied to conducted multidimensional experimental tests. The results suggest that the model has a detection accuracy > 90%, a False Positive Rate (FPR) < 0.5%, and a False Negative Rate (FNR) < 0.7% under data sets (range from 0.5 to 200,000). Relative to typical models, it performs superior and has a stable ability in adapting various protocol data, for example Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and Internet Control Message Protocol (ICMP). This study enhances the research on the combination of blockchain and reinforcement learning in the field related to network security, which provides an efficient, trustworthy, and adaptive intrusion detection scheme for distributed networks and effectively elevates the intelligence and reliability of network security protection.
- New
- Research Article
- 10.1016/j.compchemeng.2025.109520
- Mar 1, 2026
- Computers & Chemical Engineering
- Yee Hung Hong + 3 more
A risk-aware LNG terminal scheduling digital twin based on deep reinforcement learning
- New
- Research Article
- 10.1016/j.ress.2025.111951
- Mar 1, 2026
- Reliability Engineering & System Safety
- Jihao Duan + 4 more
Evacuation under terrorist attacks: A crowd congestion control method based on deep reinforcement learning
- New
- Research Article
- 10.1016/j.cej.2026.174781
- Mar 1, 2026
- Chemical Engineering Journal
- Muhammad Usman + 2 more
Deepcluster: A selective and stable strategy to predict the AuPt-based high entropy alloy nanoclusters via deep reinforcement learning
- New
- Research Article
- 10.1016/j.trip.2025.101795
- Mar 1, 2026
- Transportation Research Interdisciplinary Perspectives
- Mandana Farhang Ghahfarokhi + 3 more
Adaptive electric vehicle routing and charging with deep reinforcement learning
- New
- Research Article
- 10.1016/j.eswa.2025.130649
- Mar 1, 2026
- Expert Systems with Applications
- Jiaqi Shi + 4 more
Trajectory planning and tracking for UAVs with deep reinforcement learning and adaptive nonlinear MPC
- New
- Research Article
- 10.1016/j.asoc.2026.114614
- Mar 1, 2026
- Applied Soft Computing
- Shuyuan Liu + 4 more
A hierarchical adaptive navigation planner for UAVs in 3D complex environments based on deep reinforcement learning