A Deep Reinforcement Learning Approach for the Patrolling Problem of Water Resources Through Autonomous Surface Vehicles: The Ypacarai Lake Case

Samuel Yanes Luis,Daniel Gutierrez Reina,Sergio L. Toral Marin

doi:10.1109/access.2020.3036938

Samuel Yanes Luis, Daniel Gutierrez Reina + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.3036938

Copy DOI

Abstract

Autonomous Surfaces Vehicles (ASV) are incredibly useful for the continuous monitoring and exploring task of water resources due to their autonomy, mobility, and relative low cost. In the path planning context, the patrolling problem is usually addressed with heuristics approaches, such as Genetic Algorithms (GA) or Reinforcement Learning (RL) because of the complexity and high dimensionality of the problem. In this paper, the patrolling problem of Ypacarai Lake (Asunción, Paraguay) has been formulated as a Markov Decision Process (MDP) for two possible cases: the homogeneous and the non-homogeneous scenarios. A tailored reward function has been designed for the non-homogeneous case. Two Deep Reinforcement Learning algorithms such as Deep Q-Learning (DQL) and Double Deep Q-Learning (DDQL) have been evaluated to solve the patrolling problem. Furthermore, due to the high number of parameters and hyperparameters involved in the algorithms, a thorough search has been conducted to find the best values for training the neural networks and the proposed reward function. According to the results, a suitable configuration of the parameters allows better results for coverage, obtaining more than the 93% of the lake surface on average. In addition, the proposed approach achieves higher sample redundancy of important zones than other common-used algorithms for non-homogeneous coverage path planning such as Policy Gradient, lawnmower algorithm or random exploration, achieving an 64% improvement of the mean time between visits.

Highlights

Ypacaraí Lake is the largest body of water in Paraguay with more than 60 km2 of navigable surface (Fig. 1)
Policy Gradient (PG) will work to Double Deep Q-Learning (DDQL) but taking descent gradient steps to optimize the behavioral policy π(s; θ) directly. In this Reinforcement Learning (RL) approach, the same neural network is used for the policy and the equivalent hyperparameters (γ, Learning Rate, . . . ) remains the same in both DDQL and PG
Double Deep Q-Learning is a good approach to design a path planner for the homogeneous and non-homogeneous patrolling problem since: i) it does not need a model of the environment and ii) due to its off-policy behavior, it is not necessary any specific behavioral policy to achieve the optimality

Summary

Introduction

Ypacaraí Lake is the largest body of water in Paraguay with more than 60 km of navigable surface (Fig. 1). It is located between the cities of San Bernardino (eastwards), Areguá (westwards) and Ypacaraí (southwards) as the main source of water supplying in the area. Its importance is related to the natural life developed in the wetlands of the basin surrounding of the lake. In the past 40 years, the continuous expansion of the agriculture in the surroundings of the lake, the lack of sewerage systems in the near cities and the disposals of wastes from industries located at the shore, among other factors, have caused in the lake an abnormal eutrophication process [1]

Results

Discussion

Conclusion