Abstract

To address the challenge of maintaining high robustness of target tracking in a 3D dynamic high-altitude scenario, this paper presents a method to formulate continuous strategic maneuvers for unmanned combat air vehicles (UCAVs) based on deep deterministic policy gradient (DDPG). DDPG is an efficient reinforcement learning approach that helps UCAV perform a variety of navigation tasks in real-time in a dynamic and random electronic warfare environment, and therefore possesses clear advantages over other technologies. First, create a target tracking simulator, Tracker, in the cognitive electronic warfare framework, and conduct a theoretical analysis of maneuvering bias produced by environmental observational errors. Tracker can automatically correlate the maximum physical overload with UCAV’s attitude angles and desired movement commands. Second, shape the agent’s behavior rewards under the inspiration of vector-based navigation to ensure that the DDPG’s output is reliable. Finally, a DRL-based navigation decision framework is employed to validate the simulation for target tracking tasks in different environments and bring excellent results. In terms of behavior assessment, the agile maneuvers mastered by the agent are dissected by time segmentation of high-quality trajectories.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call