Abstract
Human performance approaches that of an ideal observer and optimal actor in some perceptual and motor tasks. These optimal abilities depend on the capacity of the cerebral cortex to store an immense amount of information and to flexibly make rapid decisions. However, behavior only approaches these limits after a long period of learning while the cerebral cortex interacts with the basal ganglia, an ancient part of the vertebrate brain that is responsible for learning sequences of actions directed toward achieving goals. Progress has been made in understanding the algorithms used by the brain during reinforcement learning, which is an online approximation of dynamic programming. Humans also make plans that depend on past experience by simulating different scenarios, which is called prospective optimization. The same brain structures in the cortex and basal ganglia that are active online during optimal behavior are also active offline during prospective optimization. The emergence of general principles and algorithms for goal-directed behavior has consequences for the development of autonomous devices in engineering applications.
Highlights
Bellman’s approach to optimizing a sequence of actions to reach a goal is based on known state transitions and payoffs [1]
The dorsal and ventral basal ganglia are heavily innervated by inputs from dopamine neurons from the substantia nigra pars compacta or ventral tegmental area, which are involved in rewards and reinforcement learning
Prospective optimization has become highly elaborated as the cortex and basal ganglia evolved to support increasingly longer time horizons and more complex behaviors
Summary
Bellman’s approach to optimizing a sequence of actions to reach a goal is based on known state transitions and payoffs [1]. The temporal-differences algorithm in reinforcement learning is closely related to the Rescorla– Wagner model [3], [4], and approximates dynamic programming [5] This approach constructs a consistent value function for states and actions based on feedback from the environment. On the basis of the reward at the end of each game, TD-Gammon discovered new strategies that had eluded experts This illustrates the ability of reinforcement learning to solve the temporal credit assignment problem and learn complex strategies that lead to winning ways. We examine how brains form cognitive strategies by prospective optimization—planning future actions to optimize rewards These more advanced aspects of reinforcement learning have the potential to greatly enhance the performance of autonomous control systems
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Proceedings of the IEEE. Institute of Electrical and Electronics Engineers
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.