Abstract

This letter addresses two challenges facing samplingbased kinodynamic motion planning: a way to identify good candidate states for local transitions and the subsequent computationally intractable steering between these candidate states. Through the combination of sampling-based planning, a Rapidly Exploring Randomized Tree (RRT) and an efficient kinodynamic motion planner through machine learning, we propose an efficient solution to long-range planning for kinodynamic motion planning. First, we use deep reinforcement learning to learn an obstacle-avoiding policy that maps a robot's sensor observations to actions, which is used as a local planner during planning and as a controller during execution. Second, we train a reachability estimator in a supervised manner, which predicts the RL policy's time to reach a state in the presence of obstacles. Lastly, we introduce RL-RRT that uses the RL policy as a local planner, and the reachability estimator as the distance function to bias tree-growth towards promising regions. We evaluate our method on three kinodynamic systems, including physical robot experiments. Results across all three robots tested indicate that RL-RRT outperforms state of the art kinodynamic planners in efficiency, and also provides a shorter path finish time than a steering function free method. The learned local planner policy and accompanying reachability estimator demonstrate transferability to the previously unseen experimental environments, making RL-RRT fast because the expensive computations are replaced with simple neural network inference.

Highlights

  • C ONSIDER motion planning for robots such as UAVs [16], autonomous ships [3], and spacecrafts [22]

  • To address the lack of available steering functions, good distance functions for aiding tree growth, and obstacle-awareness facing kinodynamic motion planning, we propose Reinforcement Learning (RL)-Rapidly Exploring Randomized Tree (RRT), which combines RL and sampling-based planning

  • To train a policy robust against noise, we model the RL policy is a solution for a continuous state, continuous action, partially observable Markov decision process (POMDP) given as a tuple (Ω, S, A, D, R, γ, O) of observations, state, actions, dynamics, reward, scalar discount, γ ∈ (0, 1), and observation probability

Read more

Summary

Introduction

C ONSIDER motion planning for robots such as UAVs [16], autonomous ships [3], and spacecrafts [22]. Manuscript received February 24, 2019; accepted June 27, 2019. Date of publication July 25, 2019; date of current version August 15, 2019. This letter was recommended for publication by Associate Editor H. Amato upon evaluation of the reviewers’ comments.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.