Water drones have significant potential for use in environmental monitoring, search and rescue operations, and marine infrastructure inspection, but the specific conditions of the water environment make it difficult to implement stable autonomous navigation. The object of research presented in this paper is the machine learning process for autonomous navigation of a water drone model in a simulated water environment. The purpose of the study is to implement a neural network model for autonomous navigation of a water drone using a reinforcement learning method that provides improved obstacle avoidance and adaptation to water currents. To achieve this purpose, a new neural network model for autonomous drone navigation in the water environment based on the reinforcement learning method is proposed, which differs from the existing ones in that it uses an improved drone control algorithm that takes into account the speed and direction of the water current, which makes it possible to stabilize the process of generating neural network coefficients. To ensure an effective learning process and optimization of the model, a simulation training environment was developed using the USVSim simulator, which contains various factors that interfere with the drone's movement, such as water current and the presence of other objects. The water drone, acting as an agent, gradually learns to choose the most effective actions to maximize positive rewards through trial and error, interacting with the environment and adapting to changing conditions. This process takes place through the use of a Deep Q-Network: the drone provides the value of its current state to a deep neural network; the neural network processes the data, predicts the value of the most effective action, and gives it to the agent. The current state of the drone is information in the form of a set of sensor readings measuring the distance to the nearest obstacles, drone’s heading and current distance to goal. The value of the effective action received from the neural network is converted into a command for the rudder that the drone can understand. The value of the drone's thruster power is calculated by separate formulas using trigonometric functions. The results of the study showed that the use of the proposed model allows the drone to make decisions in a dynamic water environment when rapid adaptation to changes is required. The model successfully adapted its strategy based on feedback from the environment, so it can be concluded that the implemented model shows significant potential for further research and applications in the field of autonomous water drones, especially in changing and unpredictable environments.
Read full abstract