Abstract
Reinforcement learning promises high performance in complex tasks as well as low online storage and computation cost. However, the trial-and-error learning approach of reinforcement learning could explore unsafe behavior in the search for an optimal solution. Run time assurance (RTA) approaches can be applied to monitor behavior and ensure safety constraint satisfaction during reinforcement learning. This paper investigates the effect of RTA on reinforcement learning training performance in terms of training efficiency, safety constraint satisfaction, control efficiency, task efficiency, and training duration. For the purposes of demonstration, a custom reinforcement learning environment is created where the objective is to develop a policy that moves a satellite into docking position with another satellite in a two-dimensional relative-motion reference frame. Six different policies are trained. The first features no RTA, the second features no RTA but a higher penalty for safety violations, and four others use different RTA techniques to enforce a dynamic velocity constraint during training. The trained policies are analyzed with standardized test points. It is shown that the policies trained without RTA do not learn to adhere to the constraint, whereas all policies trained with RTA do learn to adhere to the constraint. Although more complex RTA frameworks can be better for operational use, it is found that a simple RTA framework provides the best overall results for reinforcement learning training.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have