Abstract

Whether animals behave optimally is an open question of great importance, both theoretically and in practice. Attempts to answer this question focus on two aspects of the optimization problem, the quantity to be optimized and the optimization process itself. In this paper, we assume the abstract concept of cost as the quantity to be minimized and propose a reinforcement learning algorithm, called Value-Gradient Learning (VGL), as a computational model of behavior optimality. We prove that, unlike standard models of Reinforcement Learning, Temporal Difference in particular, VGL is guaranteed to converge to optimality under certain conditions. The core of the proof is the mathematical equivalence of VGL and Pontryagin’s Minimum Principle, a well-known optimization technique in systems and control theory. Given the similarity between VGL’s formulation and regulatory models of behavior, we argue that our algorithm may provide psychologists with a tool to formulate such models in optimization terms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.