Abstract
Tactical UAV path planning under radar threat using reinforcement learning involves particular challenges ranging from modeling related difficulties to sparse feedback problem. Learning goal-directed behavior with sparse feedback from complex environments is a fundamental challenge for reinforcement learning algorithms. In this paper we extend our previous work in this area to provide a solution to the problem setting stated above, using Hierarchical Reinforcement Learning (HRL) in a novel way that involves a meta controller for higher level goal assignment and a controller that determines the lower-level actions of the agent. Our meta controller is based on a regression model trained using a state transition scheme that defines the evolution of goal designation, whereas our lower-level controller is based on a Deep Q Network (DQN) and is trained via reinforcement learning iterations. This two-layer framework ensures that an optimal plan for a complex path, organized as multiple goals, is achieved gradually, through piecewise assignment of sub-goals, and thus as a result of a staged, efficient and rigorous procedure.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have