Abstract

Human performance approaches that of an ideal observer and optimal actor in some perceptual and motor tasks. These optimal abilities depend on the capacity of the cerebral cortex to store an immense amount of information and to flexibly make rapid decisions. However, behavior only approaches these limits after a long period of learning while the cerebral cortex interacts with the basal ganglia, an ancient part of the vertebrate brain that is responsible for learning sequences of actions directed toward achieving goals. Progress has been made in understanding the algorithms used by the brain during reinforcement learning, which is an online approximation of dynamic programming. Humans also make plans that depend on past experience by simulating different scenarios, which is called prospective optimization. The same brain structures in the cortex and basal ganglia that are active online during optimal behavior are also active offline during prospective optimization. The emergence of general principles and algorithms for goal-directed behavior has consequences for the development of autonomous devices in engineering applications.

Highlights

  • Bellman’s approach to optimizing a sequence of actions to reach a goal is based on known state transitions and payoffs [1]

  • The dorsal and ventral basal ganglia are heavily innervated by inputs from dopamine neurons from the substantia nigra pars compacta or ventral tegmental area, which are involved in rewards and reinforcement learning

  • Prospective optimization has become highly elaborated as the cortex and basal ganglia evolved to support increasingly longer time horizons and more complex behaviors

Read more

Summary

INTRODUCTION

Bellman’s approach to optimizing a sequence of actions to reach a goal is based on known state transitions and payoffs [1]. The temporal-differences algorithm in reinforcement learning is closely related to the Rescorla– Wagner model [3], [4], and approximates dynamic programming [5] This approach constructs a consistent value function for states and actions based on feedback from the environment. On the basis of the reward at the end of each game, TD-Gammon discovered new strategies that had eluded experts This illustrates the ability of reinforcement learning to solve the temporal credit assignment problem and learn complex strategies that lead to winning ways. We examine how brains form cognitive strategies by prospective optimization—planning future actions to optimize rewards These more advanced aspects of reinforcement learning have the potential to greatly enhance the performance of autonomous control systems

IDEAL OBSERVERS AND PERFORMERS
LEARNING WHERE TO LOOK
DOPAMINE NEURONS AND REWARD-PREDICTION ERROR
ELABORATIONS OF BASIC CIRCUITS AND PROSPECTIVE
A CONCEPTUAL FRAMEWORK FOR PROSPECTIVE OPTIMIZATION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.