Abstract

Event Abstract Back to Event Structure learning in human sequential decision-making Humans daily perform sequential decision-making under uncertainty to choose products, services, careers, and jobs. Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that knows the task that generates reward in the environment. This has led to conclusions about how we explore new courses of actions and exploit what we have learned. We argue, however, that humans have uncertainty about both the task and environmental structure, and that task and structure learning can potentially explain much better how people schedule actions, including behaviors previously deemed sub-optimal. We illustrate the task structure learning problem with an important special case that controls optimal exploration/exploitation. In particular, we formulate the structure learning problem using mixtures of two reward models'two-arm and one-arm bandit models'and solve the optimal action selection using Bayesian Reinforcement Learning. These two reward models represent extremes in both the exploration'exploitation tradeoff and computational difficulty'one model needs to balance exploration'exploitation and use long future horizons to compute actions, while the other needs no look-ahead and the action selection is greedy. In simulations, we show that optimal learning with uncertainty about the task structure can produce a range of qualitative behaviors deemed suboptimal in previous studies on sequential binary choice. In our experiments, each of 16 subjects (8 females) ran on 32 bandit tasks, a block of 16 in a two-arm bandits and a block of 16 one-arm bandits. Within blocks, the presentation order was randomized, and the order of the one-arm bandits was randomized across subjects. On average, each task required 48 choices. For two-arm bandits, the subjects made 1194 choices across the 16 tasks, and 925 for the one-arm bandits. Our results show that humans rapidly learn and exploit new reward structure'human behavior tracks the behavior of our structure learning model but is not explained by models that assume the task is known. Other kinds of reward structure learning may account for a broad variety of human decision-making performance. In particular, allowing dependence between the probability of reward at a site and previous actions can produce large changes in decision-making behavior. For instance, in a "foraging" model where reward is collected from a site and probabilistically replenished, optimal strategies will produce choice sequences that alternate between reward sites. Thus, uncertainty about the independence of reward on previous actions can produce a continuum of behavior, from maximization to probability matching. Instead of explaining behavior in terms of the idiosyncrasies of a learning rule, structure learning constitutes a fully rational response to uncertainty about the causal structure of rewards in the environment. Our hope is that, by expanding the range of normative hypotheses for human decision-making, we can begin to develop more principled accounts of human sequential decision-making behavior. Conference: Computational and systems neuroscience 2009, Salt Lake City, UT, United States, 26 Feb - 3 Mar, 2009. Presentation Type: Poster Presentation Topic: Poster Presentations Citation: (2009). Structure learning in human sequential decision-making. Front. Syst. Neurosci. Conference Abstract: Computational and systems neuroscience 2009. doi: 10.3389/conf.neuro.06.2009.03.238 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 03 Feb 2009; Published Online: 03 Feb 2009. Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Google Google Scholar PubMed Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.

Highlights

  • From a squirrel deciding where to bury its nuts to a scientist selecting the experiment, all decision-making organisms must balance exploration of alternatives against exploitation of known options in developing action plans

  • In an experimental test of structure learning in humans, we show that humans learn reward structure from experience in a near optimal manner

  • We argue that structure learning plays a major role in human sequential decision making

Read more

Summary

Introduction

From a squirrel deciding where to bury its nuts to a scientist selecting the experiment, all decision-making organisms must balance exploration of alternatives against exploitation of known options in developing action plans. Determining when exploration is profitable is itself a decision problem that requires understanding or learning about the statistical structure of the environment. Your aim is to maximize the total reward from the environment, but the difficulty is that the rate of reward for each option is unknown and must be learned. In this simple setting, there may be several hypothesis about how the reward generation process works—how actions, observations and unknowns are structurally ‘‘connected.’’ We propose three kinds of structures that capture several versions of sequential decisionmaking tasks available in the literature. The first structure has temporal dependency between the present probability of reward and the past probability of reward, investigated in the context of

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.