Abstract
Unmanned surface vehicles (USVs) have been widely used in research and exploration, patrol, and defense. Autonomous navigation and obstacle avoidance, as the essential technology of USVs, are the key conditions for successful mission execution. However, fine modeling of conventional algorithms cannot meet the real-time precise behavior control strategy of USVs in complex environments, which poses a great challenge to autonomous control policy. In this paper, a deep reinforcement learning-based UANOA (USVs autonomous navigation and obstacle avoidance) method is proposed. The UANOA achieves the autonomous navigation task of USVs by real-time sensing of partially complex ocean information around and real-time output of rudder angle control commands of USVs. In our work, we employ a double Q-network to achieve end-to-end control from raw sensor input to output of discrete rudder action, and design a set of reward functions that can be adapted to USV navigation and obstacle avoidance. To alleviate the decision bias caused by partial observable of USVs, we use the long short-term memory (LSTM) networks to enhance the ability to remember the ocean environment of USVs. Experiments demonstrate that UANOA ensures a USV arrives at the target points with optimal path planning in complex ocean environments without any collisions occurring, and UANOA outperforms deep Q-network (DQN) and random control policy in convergence speed, sailing distance, rudder angle steering consumption, and other performance measurements.
Highlights
Unmanned surface vehicles (USVs) are primarily used to perform tasks that are dangerous and unsuitable for manned vessels
We introduce Markov Decision Processes (MDPs) that are typically used to solve time-series complex decision tasks for modeling; secondly, we evaluate the advantages of deep reinforcement learning with double Q-learning, and we combine the observation space and control of the USV to derive the UANOA algorithm
We use MDP for modeling USV navigation and obstacle avoidance task. e proposed UANOA algorithm contains the MDP framework, and eventually, an optimal strategy π is learned by UANOA algorithm to achieve autonomous navigation of the USV
Summary
USVs are primarily used to perform tasks that are dangerous and unsuitable for manned vessels. When the USV is equipped with a variety of customized sensors, communication devices, and other equipment, etc., it will have greater flexibility and intelligence to perform a variety of complex maritime tasks [1, 2]. Combining USVs with other unmanned systems, they can build rich clusters of unmanned systems in ocean, capable of handling more complex maritime missions [3, 4]. Erefore, the autonomous navigation and obstacle avoidance capabilities of USVs are highly required. At is to say, under certain constraints, the USV will depart from the initial location and adjust its navigation route in real time according to the changes in the external environment to reach the final destination USVs encounter different marine environments in different mission scenarios and often fail in their missions due to the harsh marine environment. erefore, the autonomous navigation and obstacle avoidance capabilities of USVs are highly required. at is to say, under certain constraints, the USV will depart from the initial location and adjust its navigation route in real time according to the changes in the external environment to reach the final destination
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.