HDAO: A Hierarchical Curiosity-Driven Reinforcement Learning Approach for AUV Dynamic Obstacle Avoidance
HDAO is a hierarchical curiosity-driven reinforcement learning algorithm for AUV obstacle avoidance in dynamic, uncertain environments, introducing a Collision Threat Index, task-decoupled architecture, and a novel reward mechanism, resulting in improved success rates, faster convergence, and enhanced robustness over existing methods.
Autonomous obstacle avoidance is a critical capability for Autonomous Underwater Vehicles (AUVs) to operate safely in dynamic and uncertain marine environments. Traditional AUV control methods rely on precise physical modeling and preset rules, yet they struggle to adapt to multiple sources of uncertainty, such as random initial states, dynamic obstacles, and varying currents. In recent years, deep reinforcement learning has provided a new avenue for data-driven adaptive policy learning. However, it remains insufficient for handling long-horizon tasks with sparse rewards. While hierarchical reinforcement learning can mitigate reward sparsity through temporal abstraction, it often faces challenges including exploration–exploitation imbalance, slow global convergence, and insufficient safety guarantees. Furthermore, most existing studies neglect dynamic environmental disturbances and task continuity, which further limits the practical application of these algorithms. To address these challenges, this paper proposes a hierarchical curiosity-driven AUV obstacle avoidance algorithm (HDAO), designed for autonomous obstacle avoidance in dynamic and uncertain underwater environments. The core design of HDAO incorporates several key innovations. Firstly, it introduces a Collision Threat Index for dynamic obstacles, which enables explicit risk perception and quantifies collision threats, thereby enhancing the policy’s generalization and robustness. Secondly, a task-decoupled hierarchical architecture is employed to synergistically optimize global path planning and local obstacle avoidance behaviors. This approach effectively manages long-horizon navigation tasks while alleviating high-dimensional training pressure. Finally, a novel reward mechanism is designed by integrating hierarchical active exploration with curiosity-driven passive exploration. This mechanism effectively incentivizes the agent to explore unvisited areas under sparse reward conditions and dynamically balances exploration and exploitation. Experimental results demonstrate that HDAO significantly outperforms existing methods in terms of obstacle avoidance success rate, training convergence speed and robustness against external disturbances.
- Research Article
- 10.1016/j.ijnaoe.2026.100757
- Jan 1, 2026
- International Journal of Naval Architecture and Ocean Engineering
An integrated path tracking and obstacle avoidance method for weakly maneuverable AUVs under prescribed path constraints
- Research Article
9
- 10.3390/jmse12050695
- Apr 23, 2024
- Journal of Marine Science and Engineering
This paper proposes a fusion algorithm based on state-tracking collision detection and the simulated annealing potential field (SCD-SAPF) to address the challenges of obstacle avoidance for autonomous underwater vehicles (AUVs) in dynamic environments. Navigating AUVs in complex underwater environments requires robust autonomous obstacle avoidance capabilities. The SCD-SAPF algorithm aims to accurately assess collision risks and efficiently plan avoidance trajectories. The algorithm introduces an SCD model for proactive collision risk assessment, predicting collision risks between AUVs and dynamic obstacles. Additionally, it proposes a simulated annealing (SA) algorithm to optimize trajectory planning in a simulated annealing potential field (SAPF), integrating the SCD model with the SAPF algorithm to guide AUVs in obstacle avoidance by generating optimal heading and velocity outputs. Extensive simulation experiments demonstrate the effectiveness and robustness of the algorithm in various dynamic scenarios, enabling the early avoidance of dynamic obstacles and outperforming traditional methods. This research provides an accurate collision risk assessment and efficient obstacle avoidance trajectory planning, offering an innovative approach to the field of underwater robotics and supporting the enhancement of AUV autonomy and reliability in practical applications.
- Research Article
121
- 10.1109/access.2020.2970433
- Jan 1, 2020
- IEEE Access
Autonomous underwater vehicle (AUV) plays an increasingly important role in ocean exploration. Existing AUVs are usually not fully autonomous and generally limited to pre-planning or pre-programming tasks. Reinforcement learning (RL) and deep reinforcement learning have been introduced into the AUV design and research to improve its autonomy. However, these methods are still difficult to apply directly to the actual AUV system because of the sparse rewards and low learning efficiency. In this paper, we proposed a deep interactive reinforcement learning method for path following of AUV by combining the advantages of deep reinforcement learning and interactive RL. In addition, since the human trainer cannot provide human rewards for AUV when it is running in the ocean and AUV needs to adapt to a changing environment, we further propose a deep reinforcement learning method that learns from both human rewards and environmental rewards at the same time. We test our methods in two path following tasks—straight line and sinusoids curve following of AUV by simulating in the Gazebo platform. Our experimental results show that with our proposed deep interactive RL method, AUV can converge faster than a DQN learner from only environmental reward. Moreover, AUV learning with our deep RL from both human and environmental rewards can also achieve a similar or even better performance than that with deep interactive RL and can adapt to the actual environment by further learning from environmental rewards.
- Research Article
98
- 10.1109/tnnls.2022.3156907
- Nov 1, 2023
- IEEE Transactions on Neural Networks and Learning Systems
Due to the complexity of the ocean environment, an autonomous underwater vehicle (AUV) is disturbed by obstacles when performing tasks. Therefore, the research on underwater obstacle detection and avoidance is particularly important. Based on the images collected by a forward-looking sonar on an AUV, this article proposes an obstacle detection and avoidance algorithm. First, a deep learning-based obstacle candidate area detection algorithm is developed. This algorithm uses the You Only Look Once (YOLO) v3 network to determine obstacle candidate areas in a sonar image. Then, in the determined obstacle candidate areas, the obstacle detection algorithm based on the improved threshold segmentation algorithm is used to detect obstacles accurately. Finally, using the obstacle detection results obtained from the sonar images, an obstacle avoidance algorithm based on deep reinforcement learning (DRL) is developed to plan a reasonable obstacle avoidance path of an AUV. Experimental results show that the proposed algorithms improve obstacle detection accuracy and processing speed of sonar images. At the same time, the proposed algorithms ensure AUV navigation safety in a complex obstacle environment.
- Research Article
15
- 10.3390/jmse12040676
- Apr 18, 2024
- Journal of Marine Science and Engineering
To achieve the efficient and precise control of autonomous underwater vehicles (AUVs) in dynamic ocean environments, this paper proposes an innovative Gaussian-Process-based Model Predictive Control (GP-MPC) method. This method combines the advantages of Gaussian process regression in modeling uncertainties in nonlinear systems, and MPC’s constraint optimization and real-time control abilities. To validate the effectiveness of the proposed GP-MPC method, its performance is first evaluated for trajectory tracking control tasks through numerical simulations based on a 6-degrees-of-freedom, fully actuated, AUV dynamics model. Subsequently, for 3D scenarios involving static and dynamic obstacles, an AUV horizontal plane decoupled motion model is constructed to verify the method’s obstacle avoidance capability. Extensive simulation studies demonstrate that the proposed GP-MPC method can effectively manage the nonlinear motion constraints faced by AUVs, significantly enhancing their intelligent obstacle avoidance performance in complex dynamic environments. By effectively handling model uncertainties and satisfying motion constraints, the GP-MPC method provides an innovative and efficient solution for the design of AUV control systems, substantially improving the control performance of AUVs.
- Research Article
5
- 10.1177/01423312241237570
- Mar 24, 2024
- Transactions of the Institute of Measurement and Control
In this paper, aiming at the problem of poor path planning and obstacle avoidance effect of autonomous underwater vehicle (AUV) in a dynamic environment, a feasible rolling speed obstacle method is proposed. This method combines the rolling window method with the speed obstacle method, and designs a suitable three-dimensional model predictive controller based on the rolling window method under a hybrid obstacle avoidance structure, and achieves stable tracking of the reference path by optimizing the objective function. A three-dimensional collision cone and speed obstacle cone model is constructed while the window is rolling. If the collision avoidance condition is met, the critical collision point is calculated, and the AUV is guided to avoid obstacles safely by tracking the critical collision point; if collision avoidance ends, guide AUV trajectory recovery. The final simulation and experimental results show that the performance of the rolling speed obstacle method in avoiding dynamic obstacles is 30% higher than the rolling window method and 40% higher than the speed obstacle method. The method used in this paper can effectively improve the dynamic obstacle avoidance ability of AUV in real-time path planning.
- Conference Article
34
- 10.1109/wcsp.2019.8928110
- Oct 1, 2019
With the growing utilization of UAV in reconnaissance, agriculture, logistics and entertainment, Autonomous collision avoidance during flight has become a necessary capability for modern UAV to detect the surrounding environment and guarantee their own safety. Autonomous obstacle avoidance is a typical agent decision-making problem. Unfortunately, existing traditional decision-making methods perform poorly in this specific realm, In particular, it is unable to meet the requirements of three-dimensional obstacle avoidance of UAV, so we introduce the deep reinforcement learning (DRL) technique into autonomous obstacle avoidance. We model the obstacle avoidance process as a Markov Decision Process and introduce a structure composed of double joint neural network estimators as the decision-maker, whose input is omnidirectional sonar readings and whose output is a value function estimating future rewards. Also, we propose an adaption in the procedure of memory replay to optimize the sampling, where we assign weights to the transitions and sample them accordingly. Our method is applied in a 3-dimensional physic environment, which contains both random dynamic obstacles and floating bouncing obstacles. The goal of the drone is to reach the terminal point without crash. Double Q Learning method with priority sampling, by comparison, achieves the most excellent performance in our simulation. Compared with the traditional algorithms, the proposed algorithm not only ensures the quality of decision making, enabling the agent to learn the optimal strategy, but also effectively improves the performance of the task and the efficiency of decision making. Simulation results demonstrate its effectiveness.
- Research Article
74
- 10.1109/tase.2023.3245818
- Apr 1, 2024
- IEEE Transactions on Automation Science and Engineering
In this paper, the formation obstacle avoidance problem of autonomous underwater vehicles (AUVs) under the disturbances of ocean currents is studied. A variable formation reconfiguration and obstacle avoidance control scheme based on affine transform and the improved artificial potential field (AT-IAPF) is designed, which enable AUVs to avoid both static and dynamic obstacles under external interference, and maintain the desired time-varying formation. Because of the robustness and strong effectiveness of the time-varying control of AT and the obstacle avoidance control law of IAPF. The AT-IAPF algorithm improves the multi-AUV systems’ environmental adaptability and obstacle avoidance performance. Using the Lyapunov function’s stability constraint guarantees stability of a multi-AUV system. A series of simulation results based on MATLAB verify that AUVs can effectively avoid obstacles with different formation shapes. Obstacle avoidance experiments on bionic robotic fish demonstrate the proposed method’s feasibility. Note to Practitioners—This paper was motivated by the problem of formation reconfiguration and obstacle avoidance for AUVs. Still, it also applies to unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs). The existing formation control methods usually solve the problems of formation acquisition and time-invariant maneuvering, and rarely consider the problem of formation obstacle avoidance. This paper presents a new formation obstacle avoidance method using affine transformation (AT) and improved artificial potential field (IAPF) techniques. We use the IAPF method to plan a possible path for the formation in the obstacle environment. At the same time, the appropriate formation shape is selected according to the obstacle information to better adapt to the environment. The preliminary experiments of two bionic robot fish in near-surface positions show that this method is feasible. During the experiment, UWB is used for positioning, and a Zigbee module is used to communicate and transmit data. But it still needs to solve the problem of underwater communication, and it has yet to be tested on multiple bionic robot fish. In future studies, we will conduct multiple actual AUV formation obstacle avoidance experiments or do 3D formation control experiments underwater.
- Research Article
- 10.3390/sym17122112
- Dec 8, 2025
- Symmetry
Driven by Industry 5.0, efficient obstacle avoidance of robotic arms in dynamic environments is a key bottleneck for human–robot collaboration in smart manufacturing. Traditional path planning methods such as Rapidly-exploring Random Tree and artificial potential field work stably in static settings but exhibit flaws including path oscillation and poor real-time performance under dynamic obstacles. Deep reinforcement learning adapts to environmental changes but is limited by low sample efficiency and high computational costs, failing industrial demands. This study proposes a collaborative framework integrating improved Rapidly-exploring Random Tree Star and Deep reinforcement learning. It uses Rapidly-exploring Random Tree Star to guide Deep reinforcement learning’s strategy exploration, reducing invalid sampling by 62%, and leverages Deep reinforcement learning’s global optimization to enhance dynamic obstacle prediction. The framework achieves a task success rate of 93.8%, surpassing traditional Rapidly-exploring Random Tree Star by 21.5%, with an average path length of 1.97 m and system energy consumption of 12.6 kWh. Experiments demonstrate superior performance in extreme dynamic scenarios, including a 94.7% success rate in multi-robot collaboration. Industrial cases confirm improvements in automobile manufacturing assembly cycle time to 8.4 s per task, yield rate to 98.7%, and reductions in energy consumption by 34% and human intervention by 85.6%, providing a reliable dynamic obstacle avoidance solution for Industry 5.0 applications.
- Research Article
2
- 10.3390/app15052776
- Mar 4, 2025
- Applied Sciences
In dynamic and unstructured environments, the obstacle avoidance capabilities of Unmanned Aerial Vehicles (UAVs) are crucial for mission success. Traditional methods struggle with adaptability and effectiveness in unknown or changing scenes. In contrast, the commonly used deep reinforcement learning (DRL) ones suffer from slow convergence, reduced accuracy, and inadequate robustness due to the loss of sensitivity to outliers and parameter rigidity. To address these challenges, we propose an enhanced DRL framework that leverages a Dynamic Huber loss function tailored for UAV autonomous obstacle avoidance. By incorporating Soft updates for target network and dynamically tuning the Huber loss, the proposed method facilitates faster model convergence, superior control precision, and improved robustness. Both theoretical analysis and experimental simulation verify its effectiveness with superior planning success rate, shorter average path length, and faster model convergence over traditional approaches. Specifically, in static environments, the Dynamic Huber-loss-based DRL framework achieves a 98.85% success rate with an optimized average path length of 10.73; in dynamic environments, it attains a 74.20% success rate with an average path length of 37.04; adding wind disturbances in a dynamic environment, it attains a 70.95% success rate with an average path length of 40.40, highlighting its enhanced performance and adaptability.
- Research Article
199
- 10.1109/tase.2020.3001183
- Jun 30, 2020
- IEEE Transactions on Automation Science and Engineering
This article addresses the tracking control problem of 3-D trajectories for underactuated underwater robotic vehicles operating in a constrained workspace including obstacles. More specifically, a robust nonlinear model predictive control (NMPC) scheme is presented for the case of underactuated autonomous underwater vehicles (AUVs) (i.e., unicycle-like vehicles actuated only in the surge, heave, and yaw). The purpose of the controller is to steer the unicycle-like AUV to the desired trajectory with guaranteed input and state constraints (e.g., obstacles, predefined vehicle velocity bounds, and thruster saturations) inside a partially known and dynamic environment where the knowledge of the operating workspace is constantly updated via the vehicle’s onboard sensors. In particular, considering the sensing range of the vehicle, obstacle avoidance with any of the detected obstacles is guaranteed by the online generation of a collision-free trajectory tracking path, despite the model dynamic uncertainties and the presence of external disturbances representing ocean currents and waves. Finally, realistic simulation studies verify the performance and efficiency of the proposed framework. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —This article was motivated by the problem of robust trajectory tracking for an autonomous underwater vehicle (AUV) operating in an uncertain environment where the knowledge of the operating workspace (e.g., obstacle positions) is constantly updated online via the vehicle’s onboard sensors (e.g., multibeam imaging sonars and laser-based vision systems). In addition, there may be other system limitations (e.g., thruster saturation limits) and other operational constraints, induced by the need of various common underwater tasks (e.g., a predefined vehicle speed limit for inspecting the seabed, and mosaicking), where it should also be considered into the control strategy. However, based on the existing trajectory tracking control approaches for underwater robotics, there is a lack of an autonomous control scheme that provides a complete and credible control strategy that takes the aforementioned issues into consideration. Based on this, we present a reliable control strategy that takes into account the aforementioned issues, along with dynamic uncertainties of the model and the presence of ocean currents. In future research, we will extend the proposed methodology for multiple AUV performing collaborative inspection tasks in an uncertain environment.
- Conference Article
1
- 10.1109/icset53708.2021.9612434
- Nov 6, 2021
Hybrid Autonomous Underwater Glider (HAUG) is a vehicle used for underwater missions such as monitoring and finding new underwater resources. HAUG has good endurance and maneuverability compared to conventional Autonomous Underwater Vehicle (AUV) and Autonomous Underwater Glider (AUG). It is because HAUG has two operational modes. They are AUV and AUG's operational mode. When HAUG is in some missions, it may be faced with an obstacle that can be a threat to the HUG's safety. Therefore, HAUG should have the ability to detect and avoid obstacles. Gemini 720 im Imaging Forward Looking Sonar (FLS) is used for obstacle detection in this work. The main issue of underwater obstacle detection is noisy data received by sonar. Therefore, by designing an obstacle detection, it will overcome those issues. Frost filter and local histogram entropy are used in the sonar data processing. The processed sonar data are provided in the local sonar frame then will be used by obstacle avoidance systems. BK-product fuzzy and reactive algorithms are used for obstacle avoidance. In this paper, we added some procedures to those obstacle avoidance algorithms to handle the huge or non-complex u-shaped obstacle. Both of the obstacle detection and avoidance simulations are in ROS (Robot Operating System). The obstacle detection simulation shows that the different sizes of obstacles can be detected with average errors of approximately 0.335 meters. The obstacle avoidance simulations are in AUV's mode with no ocean current applied. The obstacle avoidance simulated in this work is with two cases. Using simulated lidar as a sensor's output and using sonar's plugin provided by Gazebo. The obstacle avoidance using simulated lidar shows that the error's value is approximately 10.12 meters, 103.62 meters, and 354.4 meters respectively. The obstacle avoidance simulation with sonar's plugin shows that the error's value is 6.55 meters.
- Research Article
41
- 10.3390/jmse9030252
- Feb 27, 2021
- Journal of Marine Science and Engineering
This research aims to solve the safe navigation problem of autonomous underwater vehicles (AUVs) in deep ocean, which is a complex and changeable environment with various mountains. When an AUV reaches the deep sea navigation, it encounters many underwater canyons, and the hard valley walls threaten its safety seriously. To solve the problem on the safe driving of AUV in underwater canyons and address the potential of AUV autonomous obstacle avoidance in uncertain environments, an improved AUV path planning algorithm based on the deep deterministic policy gradient (DDPG) algorithm is proposed in this work. This method refers to an end-to-end path planning algorithm that optimizes the strategy directly. It takes sensor information as input and driving speed and yaw angle as outputs. The path planning algorithm can reach the predetermined target point while avoiding large-scale static obstacles, such as valley walls in the simulated underwater canyon environment, as well as sudden small-scale dynamic obstacles, such as marine life and other vehicles. In addition, this research aims at the multi-objective structure of the obstacle avoidance of path planning, modularized reward function design, and combined artificial potential field method to set continuous rewards. This research also proposes a new algorithm called deep SumTree-deterministic policy gradient algorithm (SumTree-DDPG), which improves the random storage and extraction strategy of DDPG algorithm experience samples. According to the importance of the experience samples, the samples are classified and stored in combination with the SumTree structure, high-quality samples are extracted continuously, and SumTree-DDPG algorithm finally improves the speed of the convergence model. Finally, this research uses Python language to write an underwater canyon simulation environment and builds a deep reinforcement learning simulation platform on a high-performance computer to conduct simulation learning training for AUV. Data simulation verified that the proposed path planning method can guide the under-actuated underwater robot to navigate to the target without colliding with any obstacles. In comparison with the DDPG algorithm, the stability, training’s total reward, and robustness of the improved Sumtree-DDPG algorithm planner in this study are better.
- Conference Article
4
- 10.1109/mlcr57210.2022.00013
- Oct 1, 2022
With the increasing demand for ocean exploration, higher requirements on both autonomy and intelligence have been put forward on the development of Autonomous Underwater Vehicle (AUV). To this end, deep reinforcement learning methods have started being used to improve AUV's autonomy and intelligence in recent years. However, low learning efficiency and high learning cost of traditional deep reinforcement learning prevent from applying them to physical AUV systems in real underwater environments. Therefore, this paper proposed a deep interactive reinforcement learning method based on the Deep Deterministic Policy Gradient (DDPG) algorithm for continuous motion control of AUV path following. The highlight of our proposed method is the design of a new reward allocator. Specifically, different from current deep interactive reinforcement learning methods, we allow the human trainer to provide a preferred action based on the evaluation on AUV's current situation. Then, the reward allocator is used to assign rewards indirectly based on the preferred action to deal with the high frequency of continuous action changes of AUV. The proposed method was tested in a sinusoids curve following tasks in the Gazebo simulation platform with an AUV simulator of our lab. The experimental results and analysis show that AUV path following with our proposed method can learn a more stable policy about 100 episodes faster than learning from only environmental rewards or only human rewards.
- Research Article
22
- 10.1016/j.oceaneng.2024.117287
- Feb 28, 2024
- Ocean Engineering
Imitation learning from imperfect demonstrations for AUV path tracking and obstacle avoidance