Abstract
Crucial real-world problems in robotics like trajectory planning during convoy missions and autonomous rescue missions can be framed as a confinement escape problem (CEP) (a type of pursuit-evasion game). In a typical CEP, an evader attempts to escape a confinement region by sequentially making decisions to plan an escape while the region is patrolled by multiple smart pursuers. The evader has a limited sensing range and does not know the total number of pursuers and their pursuit strategies making it difficult to model the environment and obtain a generalizable escape strategy. In this paper, the CEP is formulated in a reinforcement learning (RL) framework to overcome the above difficulties. The state function is designed independent of the total number of pursuers and the shape of the confinement region thereby making the RL approach scalable. To handle training consistency issues in deep RL and convergence issues due to sparse rewards, a Scaffolding Reflection based Reinforcement Learning (SR2L) approach is presented in this paper where the SR2L employs an actor–critic method with a motion planner scaffold to accelerate its training speed. Performance evaluation of SR2L shows that it trains twice as fast compared to other existing state-of-the-art actor–critic RL methods. Performance results show that the convergence of SR2L is more consistent than the corresponding conventional actor–critic RL methods and interactive reinforcement learning methods. Monte-Carlo simulation results show that SR2L outperforms other conventional RL methods and the motion planner with at least 28% and 10% faster escape times respectively while having the lowest variance in escape times against different pursuit strategies. Ablation studies done by changing different environmental parameters clearly show the scalability and generalizability of the proposed SR2L approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.