Policy Gradient Reinforcement Learning Research Articles

The object recognition technology of unmanned aerial vehicles (UAVs) equipped with “You Only Look Once” (YOLO) has been validated in actual flights. However, here, the challenge lies in efficiently utilizing camera gimbal control technology to swiftly capture images of YOLO-identified target objects in aerial search missions. Enhancing the UAV’s energy efficiency and search effectiveness is imperative. This study aims to establish a simulation environment by employing the Unity simulation software for target tracking by controlling the gimbal. This approach involves the development of deep deterministic policy-gradient (DDPG) reinforcement-learning techniques to train the gimbal in executing effective tracking actions. The outcomes of the simulations indicate that when actions are appropriately rewarded or penalized in the form of scores, the reward value can be consistently converged within the range of 19–35. This convergence implies that a successful strategy leads to consistently high rewards. Consequently, a refined set of training procedures is devised, enabling the gimbal to accurately track the target. Moreover, this strategy minimizes unnecessary tracking actions, thus enhancing tracking efficiency. Numerous benefits arise from training in a simulated environment. For instance, the training in this simulated environment is facilitated through a dataset composed of actual flight photographs. Furthermore, offline operations can be conducted at any given time without any constraint of time and space. Thus, this approach effectively enables the training and enhancement of the gimbal’s action strategies. The findings of this study demonstrate that a coherent set of action strategies can be proficiently cultivated by employing DDPG reinforcement learning. Furthermore, these strategies empower the UAV’s gimbal to rapidly and precisely track designated targets. Therefore, this approach provides both convenience and opportunities to gather more flight-scenario training data in the future. This gathering of data will lead to immediate training opportunities and help improve the system’s energy consumption.

Reinforcement learning (RL) is an artificial intelligence (AI) approach involving learning through trial and error which has demonstrated performance exceeding humans and existing algorithms in multiple domains. We hypothesized that machine parameter optimization using RL will produce deliverable treatment plans faster and with superior dosimetry compared to conventional inverse optimization, which was tested in the setting of VMAT for localized prostate cancer. We included 151 previously treated prostate cancer patients, separated into training (n = 116) and testing (n = 35) cohorts. We used a policy gradient RL approach, where contours, dose distribution and machine parameters for the current control point are taken as input by a 3D convolutional neural network (CNN) and continuous machine parameters for the next control point are predicted, allowing the CNN to directly control the linear accelerator (Linac) model. Training was initialized using a subset of 20 training cases with plans produced in our clinical treatment planning system (TPS) prescribed 60 Gy in 20 fractions (denoted Training-TPS), but continued for all 116 training cases through trial and error-based RL. Following training, RL VMAT was applied to the initialization subset of 20 training cases (Training-RL) and 35 test cases (Test-RL), comparing resultant dosimetry. Training was conducted for 10 days using 4 GPUs generating 22,000 plans for exploration, enabling dosimetric improvements compared to the TPS-based plans and generalization to all 116 training cases. Following training, mean ± SD RL execution time for automatic plan generation was 2.9 ± 0.6 seconds. Automatic gradient descent-based parameter rescaling was applied to enable quantitative comparison with the TPS-based plans, and final dose metrics are provided in Table 1. Overall dosimetry was comparable between the TPS and RL approaches in the training and test sets, although PTV coverage and maximum dose did not meet objectives in several RL-based plans. To our knowledge, this is the first demonstration of direct RL-based control of a 3D Linac model. Initial performance is promising, demonstrating model convergence in the training set with a small number of existing plans required for initialization. However, limitations exist in model generalizability which are under continued investigation through approaches made possible through RL including data augmentation.

Policy Gradient Reinforcement Learning Research Articles

Related Topics

Articles published on Policy Gradient Reinforcement Learning

Adaptive reinforcement learning-based control using proximal policy optimization and slime mould algorithm with experimental tower crane system validation

Manipulating Camera Gimbal Positioning by Deep Deterministic Policy Gradient Reinforcement Learning for Drone Object Detection

Deep Reinforcement Learning Lane-Changing Decision Algorithm for Intelligent Vehicles Combining LSTM Trajectory Prediction

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

RRIoT: Recurrent reinforcement learning for cyber threat detection on IoT devices

Data-Driven Optimal Bipartite Consensus Control for Second-Order Multiagent Systems via Policy Gradient Reinforcement Learning.

A perspective on the use of deep deterministic policy gradient reinforcement learning for retention time modeling in reversed-phase liquid chromatography

A DDPG-based energy efficient federated learning algorithm with SWIPT and MC-NOMA

Robust interplanetary trajectory design under multiple uncertainties via meta-reinforcement learning

Cooperative control of velocity and heading for unmanned surface vessel based on twin delayed deep deterministic policy gradient with an integral compensator

Machine Parameter Optimization of a Clinical Linear Accelerator Using Deep Reinforcement Learning for Automatic Generation of Deliverable Prostate VMAT Plans

Embedding active learning in batch-to-batch optimization using reinforcement learning

Management of Congestion in Distribution Networks Utilizing Demand Side Management and Reinforcement Learning

Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning

Decentralized multi-agent control of a three-tank hybrid system based on twin delayed deep deterministic policy gradient reinforcement learning algorithm

Molecule generation using transformers and policy gradient reinforcement learning

A Hybrid Reinforcement Learning Approach With a Spiking Actor Network for Efficient Robotic Arm Target Reaching

Data-Based Predictive Control via Multistep Policy Gradient Reinforcement Learning.

Combining Neural Networks with Logic Rules

Function approximation reinforcement learning of energy management with the fuzzy REINFORCE for fuel cell hybrid electric vehicles

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Policy Gradient Reinforcement Learning Research Articles

Related Topics

Articles published on Policy Gradient Reinforcement Learning

Adaptive reinforcement learning-based control using proximal policy optimization and slime mould algorithm with experimental tower crane system validation

Manipulating Camera Gimbal Positioning by Deep Deterministic Policy Gradient Reinforcement Learning for Drone Object Detection

Deep Reinforcement Learning Lane-Changing Decision Algorithm for Intelligent Vehicles Combining LSTM Trajectory Prediction

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

RRIoT: Recurrent reinforcement learning for cyber threat detection on IoT devices

Data-Driven Optimal Bipartite Consensus Control for Second-Order Multiagent Systems via Policy Gradient Reinforcement Learning.

A perspective on the use of deep deterministic policy gradient reinforcement learning for retention time modeling in reversed-phase liquid chromatography

A DDPG-based energy efficient federated learning algorithm with SWIPT and MC-NOMA

Robust interplanetary trajectory design under multiple uncertainties via meta-reinforcement learning

Cooperative control of velocity and heading for unmanned surface vessel based on twin delayed deep deterministic policy gradient with an integral compensator

Machine Parameter Optimization of a Clinical Linear Accelerator Using Deep Reinforcement Learning for Automatic Generation of Deliverable Prostate VMAT Plans

Embedding active learning in batch-to-batch optimization using reinforcement learning

Management of Congestion in Distribution Networks Utilizing Demand Side Management and Reinforcement Learning

Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning

Decentralized multi-agent control of a three-tank hybrid system based on twin delayed deep deterministic policy gradient reinforcement learning algorithm

Molecule generation using transformers and policy gradient reinforcement learning

A Hybrid Reinforcement Learning Approach With a Spiking Actor Network for Efficient Robotic Arm Target Reaching

Data-Based Predictive Control via Multistep Policy Gradient Reinforcement Learning.

Combining Neural Networks with Logic Rules

Function approximation reinforcement learning of energy management with the fuzzy REINFORCE for fuel cell hybrid electric vehicles