Abstract

Learning is an adaptation that allows individuals to respond to environmental stimuli in ways that improve their reproductive outcomes. The degree of sophistication in learning mechanisms potentially explains variation in behavioral responses. Here, we present a model of learning that is inspired by documented intra- and interspecific variation in the performance of a simultaneous two-choice task, the biological market task. The task presents a problem that cleaner fish often face in nature: choosing between two client types, one that is willing to wait for inspection and one that may leave if ignored. The cleaner's choice hence influences the future availability of clients (i.e., it influences food availability). We show that learning the preference that maximizes food intake requires subjects to represent in their memory different combinations of pairs of client types rather than just individual client types. In addition, subjects need to account for future consequences of actions, either by estimating expected long-term reward or by experiencing a client leaving as a penalty (negative reward). Finally, learning is influenced by the absolute and relative abundance of client types. Thus, cognitive mechanisms and ecological conditions jointly explain intra- and interspecific variation in the ability to learn the adaptive response.

Highlights

  • Animals must face and appropriately respond to the everchanging nature of environmental conditions

  • In contrast to Fully Aware Agents (FAA), Partially Aware Agents (PAA) do not develop a preference for either option in the learning process; this is irrespective of the level of future discounting and the source of reward

  • We present the outcome of this analysis in figure 5 only for FAAs because PAAs do not develop a preference for the visitor irrespective of client abundance

Read more

Summary

Introduction

Animals must face and appropriately respond to the everchanging nature of environmental conditions. We refer to the cleaner as an “agent.” During learning, an agent experiences a series of states (sets of clients), chooses actions (which client to clean), and obtains food, which triggers rewards. The update rule used by both modules is based on the prediction error, a quantity measuring the mismatch between the obtained and the expected reward Learning agents update their estimate of value for a state every time they face it, and they update their preference for an action every time they take it. We are interested in how the agents develop a preference for one of the two options, not the overall estimation process; the exact magnitude of the initial value is less important than the fact that all of the states that involve at least one client are initialized with the same value They are initialized depending on the environmental and cognitive parameters according to.

Results
Discussion
Literature Cited
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.