Abstract
Value-based methods play a fundamental role in Markov decision processes (MDPs) and reinforcement learning (RL). In this paper, we present a unified control-theoretic framework for analyzing valued-based methods such as value computation (VC), value iteration (VI), and temporal difference (TD) learning (with linear function approximation). Built upon an intrinsic connection between value-based methods and dynamic systems, we can directly use existing convex testing conditions in control theory to derive various convergence results for the aforementioned value-based methods. These testing conditions are convex programs in form of either linear programming (LP) or semidefinite programming (SDP), and can be solved to construct Lyapunov functions in a straightforward manner. Our analysis reveals some intriguing connections between feedback control systems and RL algorithms. It is our hope that such connections can inspire more work at the intersection of system/control theory and RL.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.