Continuous Action Spaces Research Articles

The reconnaissance of high-value targets is prerequisite for effective operations. The recent appreciation of deep reinforcement learning (DRL) arises from its success in navigation problems, but due to the competitiveness and complexity of the military field, the applications of DRL in the military field are still unsatisfactory. In this paper, an end-to-end DRL-based intelligent reconnaissance mission planning is proposed for dual unmanned aerial vehicle (dual UAV) cooperative reconnaissance missions under high-threat and dense situations. Comprehensive consideration is given to specific mission properties and parameter requirements through the whole modelling. Firstly, the reconnaissance mission is described as a Markov decision process (MDP), and the mission planning model based on DRL is established. Secondly, the environment and UAV motion parameters are standardized to input the neural network, aiming to deduce the difficulty of algorithm convergence. According to the concrete requirements of non-reconnaissance by radars, dual-UAV cooperation and wandering reconnaissance in the mission, four reward functions with weights are designed to enhance agent understanding to the mission. To avoid sparse reward, the clip function is used to control the reward value range. Finally, considering the continuous action space of reconnaissance mission planning, the widely applicable proximal policy optimization (PPO) algorithm is used in this paper. The simulation is carried out by combining offline training and online planning. By changing the location and number of ground detection areas, from 1 to 4, the model with PPO can maintain 20% of reconnaissance proportion and a 90% mission complete rate and help the reconnaissance UAV to complete efficient path planning. It can adapt to unknown continuous high-dimensional environmental changes, is generalizable, and reflects strong intelligent planning performance.

In a cell-free wireless network, distributed access points (APs) jointly serve all user equipments (UEs) within their coverage area by using the same time/frequency resources. In this paper, we develop a novel downlink cell-free multiple-input multiple-output (MIMO) millimeter wave (mmWave) network architecture that enables all APs and UEs to dynamically self-partition into a set of independent cell-free subnetworks in a time-slot basis. For this, we propose several network partitioning algorithms based on deep reinforcement learning (DRL). Furthermore, to mitigate interference between different cell-free subnetworks, we develop a novel hybrid analog beamsteering-digital beamforming model that zero-forces interference among cell-free subnetworks and at the same time maximizes the instantaneous sum-rate of all UEs within each subnetwork. Specifically, the hybrid beamforming model is implemented by using a novel mixed DRL-convex optimization method in which analog beamsteering between APs and UEs is conducted based on DRL while digital beamforming is modeled and solved as a convex optimization problem. The DRL models for network clustering and hybrid beamsteering are combined into a single hierarchical DRL design that enables exchange of DRL agents’ experiences during both network training and operation. We also benchmark the performance of DRL models for clustering and beamsteering in terms of network performance, convergence rate, and computational complexity. Results show a significant rate enhancement due to the proposed hybrid beamforming scheme compared to its conventional all-digital counterpart. This performance enhancement becomes more significant as the number of network partitions increases. For DRL-based network clustering, the policy gradient (PG) algorithm offers the best possible performance in terms of stability and convergence rate while the state-action-reward-state-action (SARSA) algorithm suffers from significant variance, slower convergence, and slightly inferior performance than other algorithms. For DRL-based beamsteering, the soft actor-critic (SAC) algorithm with continuous action space shows the best performance. Also, online training of the agents with varying channel state information (CSI) is observed to increase the variance of the Q-values and decrease the convergence rate, with no significant effect on the average reward. The simulation codes are available at: <monospace><uri>https://github.com/yasser-aleryani/mmWaveCellFree.git</uri></monospace>

Continuous Action Spaces Research Articles

Related Topics

Articles published on Continuous Action Spaces

Marine route optimization using reinforcement learning approach to reduce fuel consumption and consequently minimize CO2 emissions

Data driven control based on Deep Q-Network algorithm for heading control and path following of a ship in calm water and waves

Deep Reinforcement Learning Based Incentive Mechanism Design for Platoon Autonomous Driving With Social Effect

Improving the Robustness of Reinforcement Learning Policies With ${\mathcal {L}_{1}}$ Adaptive Control

Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration

A Distributional Framework for Risk-Sensitive End-to-End Planning in Continuous MDPs

Decentralized Mean Field Games

Deep Reinforcement Learning for Intelligent Dual-UAV Reconnaissance Mission Planning

Crowd navigation in an unknown and complex environment based on deep reinforcement learning

Vulnerability Identification and Remediation of FDI Attacks in Islanded DC Microgrids Using Multiagent Reinforcement Learning

Energy-Efficient Driving for Adaptive Traffic Signal Control Environment via Explainable Reinforcement Learning

Parameterized deep Q-network based energy management with balanced energy economy and battery life for hybrid electric vehicles

Improvements in learning to control perched landings

Navigating Electric Vehicles Along a Signalized Corridor via Reinforcement Learning: Toward Adaptive Eco-Driving Control

Highway Decision-Making and Motion Planning for Autonomous Driving via Soft Actor-Critic

Network Selection Based on Evolutionary Game and Deep Reinforcement Learning in Space-Air-Ground Integrated Network

Self-Organizing mmWave MIMO Cell-Free Networks With Hybrid Beamforming: A Hierarchical DRL-Based Design

A prescriptive Dirichlet power allocation policy with deep reinforcement learning

Policy search with rare significant events: Choosing the right partner to cooperate with.

Joint Optimization for Mobile Edge Computing-Enabled Blockchain Systems: A Deep Reinforcement Learning Approach.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Continuous Action Spaces Research Articles

Related Topics

Articles published on Continuous Action Spaces

Marine route optimization using reinforcement learning approach to reduce fuel consumption and consequently minimize CO2 emissions

Data driven control based on Deep Q-Network algorithm for heading control and path following of a ship in calm water and waves

Deep Reinforcement Learning Based Incentive Mechanism Design for Platoon Autonomous Driving With Social Effect

Improving the Robustness of Reinforcement Learning Policies With ${\mathcal {L}_{1}}$ Adaptive Control

Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration

A Distributional Framework for Risk-Sensitive End-to-End Planning in Continuous MDPs

Decentralized Mean Field Games

Deep Reinforcement Learning for Intelligent Dual-UAV Reconnaissance Mission Planning

Crowd navigation in an unknown and complex environment based on deep reinforcement learning

Vulnerability Identification and Remediation of FDI Attacks in Islanded DC Microgrids Using Multiagent Reinforcement Learning

Energy-Efficient Driving for Adaptive Traffic Signal Control Environment via Explainable Reinforcement Learning

Parameterized deep Q-network based energy management with balanced energy economy and battery life for hybrid electric vehicles

Improvements in learning to control perched landings

Navigating Electric Vehicles Along a Signalized Corridor via Reinforcement Learning: Toward Adaptive Eco-Driving Control

Highway Decision-Making and Motion Planning for Autonomous Driving via Soft Actor-Critic

Network Selection Based on Evolutionary Game and Deep Reinforcement Learning in Space-Air-Ground Integrated Network

Self-Organizing mmWave MIMO Cell-Free Networks With Hybrid Beamforming: A Hierarchical DRL-Based Design

A prescriptive Dirichlet power allocation policy with deep reinforcement learning

Policy search with rare significant events: Choosing the right partner to cooperate with.

Joint Optimization for Mobile Edge Computing-Enabled Blockchain Systems: A Deep Reinforcement Learning Approach.