Abstract

Reinforcement learning promises high performance in complex tasks as well as low online storage and computation cost. However, the trial and error learning approach of reinforcement learning could explore unsafe behavior in the search for an optimal solution. Run Time Assurance (RTA) approaches can be applied to monitor behavior and ensure safety constraint satisfaction during reinforcement learning. This paper investigates the effect of RTA on reinforcement learning training performance in terms of training efficiency, safety constraint satisfaction, control efficiency, task efficiency, and training duration. For the purposes of demonstration, a custom reinforcement learning environment is created where the objective is to develop a policy that moves a satellite into docking position with another satellite in a two-dimensional relative motion reference frame. Six different policies are trained. The first features no RTA, the second features no RTA but a higher penalty for safety violations, and four others use different RTA techniques to enforce a dynamic velocity constraint during training. The trained policies are analyzed with standardized test points. Although more complex RTA frameworks can be better for operational use, it is found that a simple RTA framework provides the best overall results for reinforcement learning training.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call