Abstract
Despite being a widely adopted development framework for unmanned aerial vehicle (UAV), deep reinforcement learning is often considered sample inefficient. Particularly, UAV struggles to fully explore the state and action space in environments with sparse rewards. While some exploration algorithms have been proposed to overcome the challenge of sparse rewards, they are not specifically tailored for UAV platform. Consequently, applying those algorithms to UAV path planning may lead to problems such as unstable training processes and neglect of action space comprehension, possibly causing negative impacts on the path planning results. To address the problem of sparse rewards in UAV path planning, we propose an information-theoretic exploration algorithm, Entropy Explorer (EE), specifically for UAV platform. The proposed EE generates intrinsic rewards based on state entropy and action entropy to compensate for the scarcity of extrinsic rewards. To further improve sampling efficiency, a framework integrating EE and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms is proposed. Finally, the TD3-EE algorithm is tested in AirSim and compared against benchmarking algorithms. The simulation outcomes manifest that TD3-EE effectively stimulates the UAV to comprehensively explore both state and action spaces, thereby attaining superior performance compared to the benchmark algorithms in the realm of path planning.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.