TermiNet: A reinforcement learning framework for k-Path partitioning problem
TermiNet: A reinforcement learning framework for k-Path partitioning problem
- Conference Article
1
- 10.1109/aiiot54504.2022.9817159
- Jun 6, 2022
In this work we present an ensemble reinforcement learning (ERL) framework comprising of deep-Q networks (DQNs). The aim is to optimize sum rate for non orthogonal multiple access unmanned aerial network (NOMA-UAV) network. Power in downlink (DL) and bandwidth allotment for a NOMA cluster is managed over fixed UAV trajectory. The environment is dynamic and quality of service (QoS) requirements are varying for each node on ground. A comparative analysis between conventional reinforcement learning (CRL) framework and proposed ensemble of ERL yields a performance gain in undermentioned metrics. The ERL achieves 20 percent performance gain in average sum rate and the gain in spectral efficiency is 2 percent, over conventional reinforcement learning framework with single DQN. It also achieves high performance over different UAV speeds in cumulative sum rate and device coverage.
- Research Article
348
- 10.1109/jsac.2019.2904329
- Jun 1, 2019
- IEEE Journal on Selected Areas in Communications
This paper investigates a deep reinforcement learning (DRL)-based MAC protocol for heterogeneous wireless networking, referred to as a Deep-reinforcement Learning Multiple Access (DLMA). Specifically, we consider the scenario of a number of networks operating different MAC protocols trying to access the time slots of a common wireless medium. A key challenge in our problem formulation is that we assume our DLMA network does not know the operating principles of the MACs of the other networks-i.e., DLMA does not know how the other MACs make decisions on when to transmit and when not to. The goal of DLMA is to be able to learn an optimal channel access strategy to achieve a certain pre-specified global objective. Possible objectives include maximizing the sum throughput and maximizing α-fairness among all networks. The underpinning learning process of DLMA is based on DRL. With proper definitions of the state space, action space, and rewards in DRL, we show that DLMA can easily maximize the sum throughput by judiciously selecting certain time slots to transmit. Maximizing general α-fairness, however, is beyond the means of the conventional reinforcement learning (RL) framework. We put forth a new multi-dimensional RL framework that enables DLMA to maximize general α-fairness. Our extensive simulation results show that DLMA can maximize sum throughput or achieve proportional fairness (two special classes of α-fairness) when coexisting with TDMA and ALOHA MAC protocols without knowing they are TDMA or ALOHA. Importantly, we show the merit of incorporating the use of neural networks into the RL framework (i.e., why DRL and not just traditional RL): specifically, the use of DRL allows DLMA (i) to learn the optimal strategy with much faster speed and (ii) to be more robust in that it can still learn a near-optimal strategy even when the parameters in the RL framework are not optimally set.
- Research Article
87
- 10.1063/5.0052524
- Jun 1, 2021
- Physics of Fluids
In the present work, an efficient active flow control strategy in eliminating vortex-induced vibration of a cylinder at Re = 100 has been explored by two machine learning frameworks, from active learning to reinforcement learning. Specifically, an adaptive control scheme by a pair of jets placed on the poles of the cylinder as actuators has been discovered. In the active learning framework, a Gaussian progress regression surrogate model is used to predict vibration amplitude of the cylinder using a limited number of numerical simulations by combining the Bayesian optimization algorithm with specified control actions while in the reinforcement learning framework, soft actor-critic deep reinforcement learning algorithm is adopted to construct a real-time control system. The results have shown that the triangle control agent in the active learning framework can reduce the vibration amplitude of the cylinder from A = 0.6 to A = 0.43. The real-time control in the reinforcement learning framework can successfully suppress the vibration amplitude to 0.11, which is decreased by 82.7%. By comparison, there are some similarities in the amplitude and phase of the action trajectories between two intelligent learning frameworks. They both aim at keeping track of the antiphase between the position and the action, which will restrain the cylinder at a low-amplitude vibration. The underlying physics shows that the jet will contain suction in the stage of vortex generation and injection in the stage of vortex shedding. The current findings have provided a new concept to the typical flow control problem and make it more practical in industrial applications.
- Research Article
27
- 10.2202/1553-779x.1141
- May 20, 2006
- International Journal of Emerging Electric Power Systems
This paper presents the design and implementation of a learning controller for the Automatic Generation Control (AGC) in power systems based on a reinforcement learning (RL) framework. In contrast to the recent RL scheme for AGC proposed by us, the present method permits handling of power system variables such as Area Control Error (ACE) and deviations from scheduled frequency and tie-line flows as continuous variables. (In the earlier scheme, these variables have to be quantized into finitely many levels). The optimal control law is arrived at in the RL framework by making use of Q-learning strategy. Since the state variables are continuous, we propose the use of Radial Basis Function (RBF) neural networks to compute the Q-values for a given input state. Since, in this application we cannot provide training data appropriate for the standard supervised learning framework, a reinforcement learning algorithm is employed to train the RBF network. We also employ a novel exploration strategy, based on a Learning Automata algorithm, for generating training samples during Q-learning. The proposed scheme, in addition to being simple to implement, inherits all the attractive features of an RL scheme such as model independent design, flexibility in control objective specification, robustness etc. Two implementations of the proposed approach are presented. Through simulation studies the attractiveness of this approach is demonstrated.
- Dataset
1
- 10.1037/e505772014-087
- Jan 1, 2013
- PsycEXTRA Dataset
Thesis directed by
- Conference Article
- 10.1109/globecom59602.2025.11431909
- Dec 8, 2025
In this research study, a Reconfigurable Intelligent Surface (RIS)–assisted Multiple-Input Single-Output (MISO) downlink wireless communication system is implemented in three-dimensional (3D) space. Each entity—base station (BS), RIS, and user equipments (UEs)—has its own geographical coordinates (X, Y, Z). The distances between the fixed entities (BS and RIS) and the randomly located UEs are computed using the Euclidean norm. The BS–RIS and RIS–UEs links are modeled as rician fading channels, whereas the BS–UEs links are modeled as rayleigh fading channels. Joint beamforming at the BS and phase shift at the RIS is formulated to provide a better spectral efficiency to the UEs. To achieve this, the entire communication system is translated into a Reinforcement Learning (RL) framework with state, action, and reward. A Randomized Ensembled Double Q-Learning (REDQ) RL agent, which belongs to the model-free, off-policy RL family, is trained on this RL framework. REDQ has produced a better average spectral efficiency compared to other model-free, off-policy RL agents such as Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Soft-Actor Critic (SAC) in all experimental setups (signal power and RIS elements). This improvements is attributed to REDQ’s better update-to-data (UTD) ratio, ensemble of critics, and controlled Q-function variance with a common target value for all critics. These findings underscore the potential of REDQ in enhancing performance in RIS-assisted wireless communication systems.
- Research Article
8
- 10.3390/agriengineering7080252
- Aug 7, 2025
- AgriEngineering
Nitrous oxide (N2O) emissions from agriculture are rising due to increased fertilizer use and intensive farming, posing a major challenge for climate mitigation. This study introduces a novel reinforcement learning (RL) framework to optimize farm management strategies that balance crop productivity with environmental impact, particularly N2O emissions. By modeling agricultural decision-making as a partially observable Markov decision process (POMDP), the framework accounts for uncertainties in environmental conditions and observational data. The approach integrates deep Q-learning with recurrent neural networks (RNNs) to train adaptive agents within a simulated farming environment. A Probabilistic Deep Learning (PDL) model was developed to estimate N2O emissions, achieving a high Prediction Interval Coverage Probability (PICP) of 0.937 within a 95% confidence interval on the available dataset. While the PDL model’s generalizability is currently constrained by the limited observational data, the RL framework itself is designed for broad applicability, capable of extending to diverse agricultural practices and environmental conditions. Results demonstrate that RL agents reduce N2O emissions without compromising yields, even under climatic variability. The framework’s flexibility allows for future integration of expanded datasets or alternative emission models, ensuring scalability as more field data becomes available. This work highlights the potential of artificial intelligence to advance climate-smart agriculture by simultaneously addressing productivity and sustainability goals in dynamic real-world settings.
- Research Article
27
- 10.1109/tmi.2021.3069663
- Mar 30, 2021
- IEEE Transactions on Medical Imaging
Accurate standard plane (SP) localization is the fundamental step for prenatal ultrasound (US) diagnosis. Typically, dozens of US SPs are collected to determine the clinical diagnosis. 2D US has to perform scanning for each SP, which is time-consuming and operator-dependent. While 3D US containing multiple SPs in one shot has the inherent advantages of less user-dependency and more efficiency. Automatically locating SP in 3D US is very challenging due to the huge search space and large fetal posture variations. Our previous study proposed a deep reinforcement learning (RL) framework with an alignment module and active termination to localize SPs in 3D US automatically. However, termination of agent search in RL is important and affects the practical deployment. In this study, we enhance our previous RL framework with a newly designed adaptive dynamic termination to enable an early stop for the agent searching, saving at most 67% inference time, thus boosting the accuracy and efficiency of the RL framework at the same time. Besides, we validate the effectiveness and generalizability of our algorithm extensively on our in-house multi-organ datasets containing 433 fetal brain volumes, 519 fetal abdomen volumes, and 683 uterus volumes. Our approach achieves localization error of 2.52mm/10.26° , 2.48mm/10.39° , 2.02mm/10.48° , 2.00mm/14.57° , 2.61mm/9.71° , 3.09mm/9.58° , 1.49mm/7.54° for the transcerebellar, transventricular, transthalamic planes in fetal brain, abdominal plane in fetal abdomen, and mid-sagittal, transverse and coronal planes in uterus, respectively. Experimental results show that our method is general and has the potential to improve the efficiency and standardization of US scanning.
- Research Article
27
- 10.1109/access.2022.3151771
- Jan 1, 2022
- IEEE Access
This paper presents a novel reinforcement learning (RL) framework to design cascade feedback control policies for 3D bipedal locomotion. Existing RL algorithms are often trained in an end-to-end manner or rely on prior knowledge of some reference joint or task space trajectories. Unlike these studies, we propose a policy structure that decouples the bipedal locomotion problem into two modules that incorporate the physical insights from the nature of the walking dynamics and the well-established Hybrid Zero Dynamics approach for 3D bipedal walking. As a result, the overall RL framework has several key advantages, including lightweight network structure, sample efficiency, and less dependence on prior knowledge. The proposed solution learns stable and robust walking gaits from scratch and allows the controller to realize omnidirectional walking with accurate tracking of the desired velocity and heading angle. The learned policies also perform robustly against various adversarial forces applied to the torso and walking blindly on a series of challenging and unstructured terrains. These results demonstrate that the proposed cascade feedback control policy is suitable for navigation of 3D bipedal robots in indoor and outdoor environments.
- Research Article
- 10.1109/lra.2025.3625520
- Dec 1, 2025
- IEEE Robotics and Automation Letters
Autonomous Surface Vehicles (ASVs) play a crucial role in maritime operations, yet their navigation in shallow-water environments remains challenging due to dynamic disturbances and depth constraints. Traditional navigation strategies struggle with limited sensor information, making safe and efficient navigation difficult. In this paper, we propose a reinforcement learning (RL) framework for ASV navigation under depth constraints, where the vehicle must reach a target while avoiding unsafe areas with only a single depth measurement per timestep from a downward-facing Single Beam Echosounder (SBES). To enhance environmental awareness, we integrate Gaussian Process (GP) regression into the RL framework, enabling the agent to progressively estimate a bathymetric depth map from sparse sonar readings. This approach improves decision-making by providing a richer representation of the environment. Furthermore, we demonstrate effective sim-to-real transfer, ensuring that policies generalize well to real-world aquatic conditions. Experimental results validate our method's capability to improve ASV navigation performance while maintaining safety in challenging shallow-water environments.
- Conference Article
43
- 10.1109/icra40945.2020.9197175
- May 1, 2020
This paper presents a novel model-free reinforcement learning (RL) framework to design feedback control policies for 3D bipedal walking. Existing RL algorithms are often trained in an end-to-end manner or rely on prior knowledge of some reference joint trajectories. Different from these studies, we propose a novel policy structure that appropriately incorporates physical insights gained from the hybrid nature of the walking dynamics and the well-established hybrid zero dynamics approach for 3D bipedal walking. As a result, the overall RL framework has several key advantages, including lightweight network structure, short training time, and less dependence on prior knowledge. We demonstrate the effectiveness of the proposed method on Cassie, a challenging 3D bipedal robot. The proposed solution produces stable limit walking cycles that can track various walking speed in different directions. Surprisingly, without specifically trained with disturbances to achieve robustness, it also performs robustly against various adversarial forces applied to the torso towards both the forward and the backward directions.
- Research Article
5
- 10.1186/s13321-025-01006-3
- Apr 21, 2025
- Journal of Cheminformatics
The integration of artificial intelligence (AI) in drug discovery offers promising opportunities to streamline and enhance the traditional drug development process. One core challenge in de novo molecular design is modeling complex structure-activity relationships (SAR), such as activity cliffs, where minor molecular changes yield significant shifts in biological activity. In response to the limitations of current models in capturing these critical discontinuities, we propose the Activity Cliff-Aware Reinforcement Learning (ACARL) framework. ACARL leverages a novel activity cliff index to identify and amplify activity cliff compounds, uniquely incorporating them into the reinforcement learning (RL) process through a tailored contrastive loss. This RL framework is designed to focus model optimization on high-impact regions within the SAR landscape, improving the generation of molecules with targeted properties. Experimental evaluations across multiple protein targets demonstrate ACARL’s superior performance in generating high-affinity molecules compared to existing state-of-the-art algorithms. These findings indicate that ACARL effectively integrates SAR principles into the RL-based drug design pipeline, offering a robust approach for de novo molecular designScientific contribution Our work introduces a machine learning-based drug design framework that explicitly models activity cliffs, a first in AI-driven molecular design. ACARL’s primary technical contributions include the formulation of an activity cliff index to detect these critical points, and a contrastive RL loss function that dynamically enhances the generation of activity cliff compounds, optimizing the model for high-impact SAR regions. This approach demonstrates the efficacy of combining domain knowledge with machine learning advances, significantly expanding the scope and reliability of AI in drug discovery.
- Research Article
14
- 10.3390/app132011275
- Oct 13, 2023
- Applied Sciences
Intrusion detection systems (IDSs) play a pivotal role in safeguarding networks and systems against malicious activities. However, the challenge of imbalanced datasets significantly impacts IDS research, skewing learning models towards the majority class and diminishing accuracy for the minority class. This study introduces the Reinforcement Learning (RL) Framework with Oversampling and Undersampling Algorithm (RLFOUA) to address imbalanced datasets. RLFOUA combines RL with diverse resampling algorithms, creating an adaptive learning environment. It integrates the novel True False Rate Synthetic Minority Oversampling Technique (TFRSMOTE) algorithm, emphasizing data-level approaches. Additionally, RLFOUA employs a cost-sensitive approach based on classification metrics. Using the CSE-CIC-IDS2018 and NSL-KDD datasets, RLFOUA demonstrates substantial improvement over existing resampling techniques. Achieving an accuracy of 0.9981 for NSL-KDD and 0.9846 for CSE-CIC-IDS2018, the framework’s performance is evaluated using F1 score, accuracy, precision, recall, and a proposed Index Metric (IM). RLFOUA presents a significant advancement in addressing class imbalance challenges in IDS. It shows an average accuracy improvement of 21.5% compared to the recent resampling technique AESMOTE on the NSL-KDD dataset.
- Conference Article
- 10.21528/cbic2025-1146000
- Dec 1, 2025
Classification with Costly Features (CwCF) addresses the challenge of balancing feature acquisition costs with classification accuracy. Traditional methods often rely on static feature subsets, leading to suboptimal resource utilization. This paper proposes a novel reinforcement learning (RL) framework enhanced with Transformer architecture to dynamically select features in a sequential manner, optimizing cost-accuracy trade-offs. Our model leverages the Transformer architecture to handle variable-length input sequences, combining embeddings for both feature values and their respective feature identities, while integrating Deep Q-Networks for decision-making. Experiments across synthetic and real-world benchmarks demonstrate that the Transformer-based approach achieves competitive accuracy compared to prior RL methods, though it falls short of state-of-the-art (SotA) models in cumulative return. The results highlight the model’s ability to adaptively gather informative features, emphasizing accuracy over cost minimization. This work brings out the potential of sequence-based architectures in dynamic feature selection and opens discussion for further exploration of Transformer-RL hybrids in cost-sensitive classification tasks.
- Research Article
45
- 10.1109/tits.2023.3265517
- Aug 1, 2023
- IEEE Transactions on Intelligent Transportation Systems
This study aims to determine the optimal deployment plan for EV fast charging stations in a transportation network with a limited budget. The objective of the deployment problem is to maximize the quality of service (QoS) with respect to both waiting time and range anxiety from the perspective of EV customers. With the rapid growth of the electric vehicle (EV) market penetration, state-of-the-art algorithms based on mathematical programming are limited in handling high-dimensional optimization problems adequately. Unlike previous studies, we make the first attempt to formulate the fast charging station deployment problem (FCSDP) as a finite discrete Markov decision process (MDP) in a novel reinforcement learning (RL) framework to alleviate the curse of dimensionality problem. Since creating a supervised training dataset is impractical due to the high computational complexity of the FCSDP, we propose a recurrent neural network (RNN) with an attention mechanism to learn the model parameters and determine the optimal policy in a completely unsupervised manner. Finally, numerical experiments are conducted on multiple problem sizes to evaluate the performance of the RNN-based RL framework. Simulation results show that the proposed approach outperforms the comparing algorithms in terms of solution quality and computation time.