Abstract

It is crucial for robots to autonomously steer in complex environments safely without colliding with any obstacles. Compared to conventional methods, deep reinforcement learning-based methods are able to learn from past experiences automatically and enhance the generalization capability to cope with unseen circumstances. Therefore, we propose an end-to-end deep reinforcement learning algorithm in this paper to improve the performance of autonomous steering in complex environments. By embedding a branching noisy dueling architecture, the proposed model is capable of deriving steering commands directly from raw depth images with high efficiency. Specifically, our learning-based approach extracts the feature representation from depth inputs through convolutional neural networks and maps it to both linear and angular velocity commands simultaneously through different streams of the network. Moreover, the training framework is also meticulously designed to improve the learning efficiency and effectiveness. It is worth noting that the developed system is readily transferable from virtual training scenarios to real-world deployment without any fine-tuning by utilizing depth images. The proposed method is evaluated and compared with a series of baseline methods in various virtual environments. Experimental results demonstrate the superiority of the proposed model in terms of average reward, learning efficiency, success rate as well as computational time. Moreover, a variety of real-world experiments are also conducted which reveal the high adaptability of our model to both static and dynamic obstacle-cluttered environments.

Highlights

  • Autonomous robots have been built for a wide range of applications, such as surveillance, exploration, data collection, inspection, rescue and service, etc. [1]

  • We obtain depth images from a Kinect and derive the control commands directly from the depth inputs based on the Q-values estimated by the BND-double DQN (DDQN) model

  • It is worth noting that the BND-DDQN model trained in the virtual environments is directly transferred to the real-world deployment without any fine-tuning of parameters, and it is proved that the our model is capable of adapting well to real-world unseen scenarios by adopting depth images as the inputs to the network

Read more

Summary

Introduction

Autonomous robots have been built for a wide range of applications, such as surveillance, exploration, data collection, inspection, rescue and service, etc. [1]. Autonomous robots have been built for a wide range of applications, such as surveillance, exploration, data collection, inspection, rescue and service, etc. In an MDP, a policy π ( a|s) specifies the probability of mapping state s to action a. The action-value function is the expectation of discounted sums of rewards, given that action a is taken in state s and policy π is executed afterwards. The objective of the agent is to maximize the expected cumulative future reward, and this can be achieved by adopting the Q-learning algorithm which approximates the optimal action-value function iteratively using the Bellman equation described in the following: Q∗ (st , at ) = R(st , at ) + γmaxQ(st+1 , at+1 )

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.