Abstract

Previous approaches for small fixed-wing unmanned air systems that carry strapdown rather than gimbaled cameras achieved satisfactory ground target tracking performance using both standard and deep reinforcement learning algorithms. However, these approaches have significant restrictions and abstractions to the dynamics of the vehicle, such as constant airspeed and constant altitude, because the number of states and actions was necessarily limited. Thus, extensive tuning was required to obtain good tracking performance. The expansion from 4 state–action degrees of freedom to 15 enabled the agent to exploit previous reward functions that produced novel yet undesirable emergent behavior. This paper investigates the causes of and various potential solutions to undesirable emergent behavior in the ground target tracking problem. A combination of changes to the environment, reward structure, action space simplification, command rate, and controller implementation provides insight into obtaining stable tracking results. Consideration is given to reward structure selection and refinement to mitigate undesirable emergent behavior. Results presented in the paper for a simulated environment of a single unmanned air system tracking a randomly moving single ground target show that a soft actor–critic algorithm can produce feasible tracking trajectories without limiting the state space and action space, provided that the environment is properly posed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call