Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle Navigation

Shixun You,Ming Diao,Lipeng Gao

doi:10.3390/electronics8050576

Shixun You, Ming Diao + Show 1 more

Open Access

https://doi.org/10.3390/electronics8050576

Copy DOI

Journal: Electronics	Publication Date: May 25, 2019
Citations: 4	License type: CC BY 4.0

Affiliation: Harbin Engineering University

Abstract

In cognitive electronic warfare, when a typical combat vehicle, such as an unmanned combat air vehicle (UCAV), uses radar sensors to explore an unknown space, the target-searching fails due to an inefficient servoing/tracking system. Thus, to solve this problem, we developed an autonomous reasoning search method that can generate efficient decision-making actions and guide the UCAV as early as possible to the target area. For high-dimensional continuous action space, the UCAV’s maneuvering strategies are subject to certain physical constraints. We first record the path histories of the UCAV as a sample set of supervised experiments and then construct a grid cell network using long short-term memory (LSTM) to generate a new displacement prediction to replace the target location estimation. Finally, we enable a variety of continuous-control-based deep reinforcement learning algorithms to output optimal/sub-optimal decision-making actions. All these tasks are performed in a three-dimensional target-searching simulator, i.e., the Explorer game. Please note that we use the behavior angle (BHA) for the first time as the main factor of the reward-shaping of the deep reinforcement learning framework and successfully make the trained UCAV achieve a 99.96% target destruction rate, i.e., the game win rate, in a 0.1 s operating cycle.

Highlights

The cognitive degree of electronic warfare depends on the adaptability of the autonomous decision-making system of combat vehicles to various tasks
When using distance-based rewards, the soft actor-critic (SAC) algorithm has a certain improvement in the win rate (WR) of all levels of tasks, and the performance of the policy optimization (PPO) (KL) and PPO (CLIP) algorithms changes drastically, while the performances of the A3C and deep deterministic policy gradient (DDPG) algorithms hardly change
The behavior angle (BHA)-driven incentive enabled the performance of the deep reinforcement learning (DRL) framework to be state-of-the-art because in the most difficult game, algorithms including SAC show an obvious convergence trend, and the PPO algorithm even reaches 88.68% WR in 8 hours

Summary

Introduction

The cognitive degree of electronic warfare depends on the adaptability of the autonomous decision-making system of combat vehicles to various tasks. The method proposed in this paper for UCAV control is applicable to conventional UAVs. For factor 2, because all states and behaviors of the UCAV in Explorer can be characterized and the simulation environment can provide rich and direct multiagent interactions, we consider using deep reinforcement learning (DRL) algorithms to obtain the optimal control strategy that is approximately end-to-end for the UCAV. For factor 2, because all states and behaviors of the UCAV in Explorer can be characterized and the simulation environment can provide rich and direct multiagent interactions, we consider using deep reinforcement learning (DRL) algorithms to obtain the optimal control strategy that is approximately end-to-end for the UCAV At this time, the input of the system is the UCAV’s observation state, and the output is the planned acceleration vector of the UCAV.

Target-Searching in CEW

Deep Reinforcement Learning

Problem Formulation Based on Explorer

Game Environment

Motion State

Action

Reward Shaping

Observation Estimation and Prediction

Path Integration

Cell Activations

Grid Cell Network Architecture

Objective Function

Behavior-Angle-Based Reward

Deep Reinforcement Learning Framework for Continuous Control

Preliminaries

Algorithms

System Implementation and Simulation Details

System Configurations

Simulation Platform of Software and Hardware

Simulation Details and Metrics

Simulation Results

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle Navigation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Deep Reinforcement Learning for Target Searching in Cognitive Electronic Warfare
Shixun You ... Lipeng Gao
IEEE Access | VOL. 7
Shixun You, et. al.Shixun You ... Lipeng Gao
01 Jan 2019
IEEE Access | VOL. 7

UCAV Path Planning Algorithm Based on Deep Reinforcement Learning
Jingpeng Gao ... Kaiyuan Zheng
-
Jingpeng Gao, et. al.Jingpeng Gao ... Kaiyuan Zheng
01 Jan 2019
01 Jan 2019

Soft Actor-Critic-Based Continuous Control Optimization for Moving Target Tracking
Xiaoli Zhang ... Shixun You
-
Xiaoli Zhang, et. al.Xiaoli Zhang ... Shixun You
01 Jan 2019
01 Jan 2019

Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm
Jing-Ping Shi ... Wei-Guo Zhang
Defence Technology | VOL. 18
Jing-Ping Shi, et. al.Jing-Ping Shi ... Wei-Guo Zhang
25 Sep 2021
Defence Technology | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle Navigation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics