Abstract
The task of obstacle avoidance using maritime vessels, such as Unmanned Surface Vehicles (USV), has traditionally been solved using specialized modules that are designed and optimized separately. However, this approach requires a deep insight into the environment, the vessel, and their complex dynamics. We propose an alternative method using Imitation Learning (IL) through Deep Reinforcement Learning (RL) and Deep Inverse Reinforcement Learning (IRL) and present a system that learns an end-to-end steering model capable of mapping radar-like images directly to steering actions in an obstacle avoidance scenario. The USV used in the work is equipped with a Radar sensor and we studied the problem of generating a single action parameter, heading. We apply an IL algorithm known as generative adversarial imitation learning (GAIL) to develop an end-to-end steering model for a scenario where avoidance of an obstacle is the goal. The performance of the system was studied for different design choices and compared to that of a system that is based on pure RL. The IL system produces results that indicate it is able to grasp the concept of the task and that in many ways are on par with the RL system. We deem this to be promising for future use in tasks that are not as easily described by a reward function.
Highlights
We consider the design of an autonomous system for steering an unmanned surface vehicle (USV) using an end-to-end approach, where the system directly generates action parameters based on the sensory data
3.1 Reinforcement Learning In RL, the process of the agent interacting with its environment and the resulting reward is formulated as a Markov Decision Process (MDP), a tuple < X, U, P, R >
We have presented a system which learns an end-toend steering model, through generative adversarial imitation learning (GAIL) based Imitation Learning (IL) and the use of a set of expert demonstrations
Summary
We consider the design of an autonomous system for steering an unmanned surface vehicle (USV) using an end-to-end approach, where the system directly generates action parameters based on the sensory data. Artificial Neural Networks (ANN) that offer a high level of expressive power is a suitable candidate for implementing such a system They are able to handle highly nonlinear relationships between their input and output [12, 4], an ability which is necessary in order to perform end-to-end steering in USVs. Systems that couple deep ANNs with Reinforcement Learning (RL), have proved to be able to learn complex tasks [14, 11, 10, 8, 3, 7, 15]. At each time step t = 0, 1, 2, 3..., the agent experiences the state of the environment, x ∈ X , and must decide on some action, u ∈ U(x). We want to maximize the expected discounted reward of the policy π:
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Proceedings of the Northern Lights Deep Learning Workshop
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.