Abstract

When applying Q-learning algorithm to AUV route planning, it is difficult to strike a balance between exploration and exploitation. Usually, the fixed greedy rate or the greedy rate with fixed change trend is adopted according to experience. However, the value of greedy rate in the above method cannot be matched with the learning environment, and often occurs a problem of falling into a local optimal solution or slow learning convergence. In order to solve the above problems, we proposed an adaptive Q-learning algorithm. It guides the greedy rate with environmental complexity, and the environmental complexity degree of AUV is evaluated by the environmental complexity mathematical model. By setting up the simulation experiment of two-dimensional environment, the fixed greedy rate and the greedy rate with fixed change trend is compared. The simulation results show that the adaptive Q-learning algorithm considering environmental complexity converges faster and is less prone to fall into local optimal solutions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call