Off-policy Reinforcement Learning Algorithm Research Articles

In response to the problem of time-varying spherical formation control for a heterogeneous unmanned aerial vehicle (UAV) swarm system with dynamic uncertainty in the system model, this paper proposes an optimal distributed formation containment control method based on reinforcement learning. By combining the time-varying formation containment vector design with the distributed predefined time observer and constructing the augmented system of a multi-quadrotor UAV system and an observer, the problem of time-varying formation containment control in heterogeneous swarm systems is transformed into a stabilization problem. By introducing a value function with a discount factor, the stabilization problem of a heterogeneous UAV swarm system is transformed into an optimal control problem. Using the “actor-critic” neural network, combined with an off-policy reinforcement learning algorithm and distributed control methods, the solution to the formation containment controller is achieved in a data-driven manner. The stability of distributed observer and formation containment controller as well as the convergence of reinforcement learning algorithms are demonstrated using Lyapunov and related theory. The numerical simulation results demonstrate that the observation error of the designed distributed observer converges within a predefined time of 0.1 s, while the overall formation tracking error of the heterogeneous swarm converges within 2.74 s, thereby further validating the effectiveness and superiority of the designed control scheme.

CEM-TD3 is a combination scheme using the simple cross-entropy method (CEM) and Twin Delayed Deep Deterministic policy gradient (TD3), and it achieves a satisfactory trade-off between performance and sample efficiency. However, we find that CEM-TD3 cannot fully address the low efficiency of policy search caused by CEM, and the policy gradient learning introduced by TD3 will weaken the diversity of individuals in the population. In this paper, we propose Double Buffers CEM-TD3 (DBCEM-TD3) that optimizes both CEM and TD3. For CEM, DBCEM-TD3 maintains an actor buffer to store the population required for evolution. In each iteration, it only needs to generate a small number of actors to replace the poor actors in the policy buffer to achieve more efficient evolution. The fitness of individuals in the actor buffer decreases exponentially with time, which can avoid premature convergence of the mean actor. For TD3, DBCEM-TD3 maintains a critic buffer with the same number of critics as the number of actors generated in each iteration, and each critic is trained independently by sampling from the shared replay buffer. In each iteration, each newly generated actor uses different critics to guide learning. This ensures more diverse behaviors among the learned actors, enabling richer experiences to be collected during the evaluation phase. We conduct experimental evaluations on five continuous control tasks provided by OpenAI Gym. DBCEM-TD3 outperforms CEM-TD3, TD3, and other classic off-policy reinforcement learning algorithms in terms of performance and sample efficiency.

Off-policy Reinforcement Learning Algorithm Research Articles

Related Topics

Articles published on Off-policy Reinforcement Learning Algorithm

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

Heterogeneous unmanned swarm formation containment control based on reinforcement learning

Integrating an Ensemble Reward System into an Off-Policy Reinforcement Learning Algorithm for the Economic Dispatch of Small Modular Reactor-Based Energy Systems

Integrating human learning and reinforcement learning: A novel approach to agent training

Utilizing reinforcement learning for de novo drug design

Optimal Tracking Control of Heterogeneous MASs Using Event-Driven Adaptive Observer and Reinforcement Learning.

Double Buffers CEM-TD3: More Efficient Evolution and Richer Exploration

Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

Distributed Minmax Strategy for Multiplayer Games: Stability, Robustness, and Algorithms.

Off-policy RL algorithms can be sample-efficient for continuous control via sample multiple reuse

Model-free control of underwater vehicle-manipulator system interacting with unknown environments

Model-Free Quantum Gate Design and Calibration Using Deep Reinforcement Learning

An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems

Scalable reinforcement learning approaches for dynamic pricing in ride-hailing systems

A reinforcement learning integral sliding mode control scheme against lumped disturbances in hot strip rolling

Reinforcement Learning and Optimal Control of PMSM Speed Servo System

Model-Free H∞ Output Feedback Control of Road Sensing in Vehicle Active Suspension Based on Reinforcement Learning

Off-Policy Learning-Based Following Control of Cooperative Autonomous Vehicles Under Distributed Attacks

Robust Output Regulation and Reinforcement Learning-Based Output Tracking Design for Unknown Linear Discrete-Time Systems

Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Off-policy Reinforcement Learning Algorithm Research Articles

Related Topics

Articles published on Off-policy Reinforcement Learning Algorithm

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

Heterogeneous unmanned swarm formation containment control based on reinforcement learning

Integrating an Ensemble Reward System into an Off-Policy Reinforcement Learning Algorithm for the Economic Dispatch of Small Modular Reactor-Based Energy Systems

Integrating human learning and reinforcement learning: A novel approach to agent training

Utilizing reinforcement learning for de novo drug design

Optimal Tracking Control of Heterogeneous MASs Using Event-Driven Adaptive Observer and Reinforcement Learning.

Double Buffers CEM-TD3: More Efficient Evolution and Richer Exploration

Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

Distributed Minmax Strategy for Multiplayer Games: Stability, Robustness, and Algorithms.

Off-policy RL algorithms can be sample-efficient for continuous control via sample multiple reuse

Model-free control of underwater vehicle-manipulator system interacting with unknown environments

Model-Free Quantum Gate Design and Calibration Using Deep Reinforcement Learning

An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems

Scalable reinforcement learning approaches for dynamic pricing in ride-hailing systems

A reinforcement learning integral sliding mode control scheme against lumped disturbances in hot strip rolling

Reinforcement Learning and Optimal Control of PMSM Speed Servo System

Model-Free H∞ Output Feedback Control of Road Sensing in Vehicle Active Suspension Based on Reinforcement Learning

Off-Policy Learning-Based Following Control of Cooperative Autonomous Vehicles Under Distributed Attacks

Robust Output Regulation and Reinforcement Learning-Based Output Tracking Design for Unknown Linear Discrete-Time Systems

Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems