Abstract

In this paper, a novel successive approximation framework, named hybrid iteration (HI), is proposed to fill up the performance gap between two well-known dynamic programming algorithms, namely policy iteration (PI) and value iteration (VI). Using HI, an approximated optimal control policy can be learned without prior knowledge of an initial admissible control policy required by PI. Additionally, the HI algorithm converges to the optimal solution much faster than VI, and thus requires tremendously less number of learning iterations and CPU-time, compared to VI. Initially, we develop a model-based HI algorithm, and then extend it to a data-driven HI algorithm which learns the optimal control policy without any information of the physics of the system. Simulation results demonstrate the efficacy of the proposed HI algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call