When Does Reward Maximization Lead to Matching Law?

Yutaka Sakai,Tomoki Fukai

doi:10.1371/journal.pone.0003795

Abstract

What kind of strategies subjects follow in various behavioral circumstances has been a central issue in decision making. In particular, which behavioral strategy, maximizing or matching, is more fundamental to animal's decision behavior has been a matter of debate. Here, we prove that any algorithm to achieve the stationary condition for maximizing the average reward should lead to matching when it ignores the dependence of the expected outcome on subject's past choices. We may term this strategy of partial reward maximization “matching strategy”. Then, this strategy is applied to the case where the subject's decision system updates the information for making a decision. Such information includes subject's past actions or sensory stimuli, and the internal storage of this information is often called “state variables”. We demonstrate that the matching strategy provides an easy way to maximize reward when combined with the exploration of the state variables that correctly represent the crucial information for reward maximization. Our results reveal for the first time how a strategy to achieve matching behavior is beneficial to reward maximization, achieving a novel insight into the relationship between maximizing and matching.

Highlights

How do animals, including humans, determine appropriate behavioral responses when their behavioral outcomes are uncertain? Decision-making is a fundamental process of the brain for organizing behaviors, and depends crucially on how subjects have been rewarded in their past behavioral responses
A well-known example includes the reinforcement learning theory based on the temporal difference (TD) error algorithm[1], which is powerful enough to solve difficult problems in machine control and accounts for the basal-ganglia activity representing reward expectancy in monkeys and humans[2,3,4]
We focus on the matching law

Summary

Introduction

Decision-making is a fundamental process of the brain for organizing behaviors, and depends crucially on how subjects have been rewarded in their past behavioral responses. A well-known example includes the reinforcement learning theory based on the temporal difference (TD) error algorithm[1], which is powerful enough to solve difficult problems in machine control and accounts for the basal-ganglia activity representing reward expectancy in monkeys and humans[2,3,4]. Many algorithms in machine learning and other brain-style computations aim at reward maximization or, somewhat more generally, optimization of a given cost function. Animals often exhibit matching behavior in a variety of decision-making tasks[6,7,8,9], even if such behavior does not necessarily maximize reward. Matching and maximizing are mathematically equivalent in simple tasks[10,11], but not in arbitrary tasks[12,13,14,15]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Nov 24, 2008
Citations: 54	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

When Does Reward Maximization Lead to Matching Law?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Comparative Approaches to Studying Strategy: Towards an Evolutionary Account of Primate Decision Making
Sarah F Brosnan ... Audrey E Parrish
Evolutionary Psychology | VOL. 11
Sarah F Brosnan, et. al.Sarah F Brosnan ... Audrey E Parrish
01 Jul 2013
Evolutionary Psychology | VOL. 11

Gambling primates: reactions to a modified Iowa Gambling Task in humans, chimpanzees and capuchin monkeys.
Darby Proctor ... Rebecca A Williamson
Animal Cognition | VOL. 17
Darby Proctor, et. al.Darby Proctor ... Rebecca A Williamson
07 Feb 2014
Animal Cognition | VOL. 17

Similarity Effect and Optimal Control of Multiple-Choice Decision Making
Moran Furman ... Xiao-Jing Wang
Neuron | VOL. 60
Moran Furman, et. al.Moran Furman ... Xiao-Jing Wang
01 Dec 2008
Neuron | VOL. 60

Stability Analysis of EPC Consortium Cooperation Based on Evolutionary Game
Judan Hu ... Yu Yao
International Journal on Semantic Web and Information Systems | VOL. 20
Judan Hu, et. al.Judan Hu ... Yu Yao
07 Mar 2024
International Journal on Semantic Web and Information Systems | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

When Does Reward Maximization Lead to Matching Law?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE