Abstract

This paper explores the Reinforcement Learning (RL) approach in an underwater navigation problem, subject to infrequent position measurements and other unsynchronized observations in the absence of a Doppler velocity log equipment. In particular, an end-to-end History-Window(HW)-RL approach and a classical RL approach are studied and their performances are compared against an anti-windup PID controller. The state used in classical RL and PID is estimated from an Extend-Kalman-Filter (EKF), while the HW-RL utilizes a history of previous measurements as the input to the neural network controller, for the purpose of jointly training an (encoded) state estimator and a controller. Preliminary results have been obtained through numerical simulations conducted in the Webots simulator, showing the HW-RL and the classical RL (with EKF) are able to track given waypoints when the position measurements are infrequent.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call