Abstract
This letter addresses two challenges facing samplingbased kinodynamic motion planning: a way to identify good candidate states for local transitions and the subsequent computationally intractable steering between these candidate states. Through the combination of sampling-based planning, a Rapidly Exploring Randomized Tree (RRT) and an efficient kinodynamic motion planner through machine learning, we propose an efficient solution to long-range planning for kinodynamic motion planning. First, we use deep reinforcement learning to learn an obstacle-avoiding policy that maps a robot's sensor observations to actions, which is used as a local planner during planning and as a controller during execution. Second, we train a reachability estimator in a supervised manner, which predicts the RL policy's time to reach a state in the presence of obstacles. Lastly, we introduce RL-RRT that uses the RL policy as a local planner, and the reachability estimator as the distance function to bias tree-growth towards promising regions. We evaluate our method on three kinodynamic systems, including physical robot experiments. Results across all three robots tested indicate that RL-RRT outperforms state of the art kinodynamic planners in efficiency, and also provides a shorter path finish time than a steering function free method. The learned local planner policy and accompanying reachability estimator demonstrate transferability to the previously unseen experimental environments, making RL-RRT fast because the expensive computations are replaced with simple neural network inference.
Highlights
C ONSIDER motion planning for robots such as UAVs [16], autonomous ships [3], and spacecrafts [22]
To address the lack of available steering functions, good distance functions for aiding tree growth, and obstacle-awareness facing kinodynamic motion planning, we propose Reinforcement Learning (RL)-Rapidly Exploring Randomized Tree (RRT), which combines RL and sampling-based planning
To train a policy robust against noise, we model the RL policy is a solution for a continuous state, continuous action, partially observable Markov decision process (POMDP) given as a tuple (Ω, S, A, D, R, γ, O) of observations, state, actions, dynamics, reward, scalar discount, γ ∈ (0, 1), and observation probability
Summary
C ONSIDER motion planning for robots such as UAVs [16], autonomous ships [3], and spacecrafts [22]. Manuscript received February 24, 2019; accepted June 27, 2019. Date of publication July 25, 2019; date of current version August 15, 2019. This letter was recommended for publication by Associate Editor H. Amato upon evaluation of the reviewers’ comments.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.