Abstract

An n-armed bandit task was used to investigate the trade-off between exploratory (choosing lesser-known options) and exploitive (choosing options with the greatest known probability of reinforcement) human choice in a trial-and-error learning problem. A different probability of reinforcement was assigned to each of eight response options using random-ratios (RRs), and participants chose by clicking buttons in a circular display on a computer screen using a computer mouse. To differentially increase exploration, relative frequency thresholds were randomly assigned to each participant and acted as task constraints limiting the proportion of total responses that could be attributed to any response option. The potential benefit of increased exploration in non-stationary environments was investigated by changing payoff probabilities so that the leanest options became the richest or the richest options became the leanest. On the average, forcing participants to explore at moderate to high levels always resulted in their earning less reinforcement, even when the payoffs changed. This outcome may be due to humans’ natural level of exploration in our task being sufficiently high to create sensitivity to environmental dynamics.

Highlights

  • An n-armed bandit task was used to investigate the trade-off between exploratory and exploitive human choice in a trial-and-error learning problem

  • A different probability of reinforcement was assigned to each of eight response options using random-ratios (RRs), and participants chose by clicking buttons in a circular display on a computer screen using a computer mouse

  • Relative frequency thresholds were randomly assigned to each participant and acted as task constraints limiting the proportion of total responses that could be attributed to any response option

Read more

Summary

Powered by the California Digital Library University of California

Altering task complexity often produces a fundamental change in the task being performed In contrast to these other approaches, introducing a task constraint is a method of directly and cleanly manipulating the level of exploration that allows the reinforcement probability for each response option to be affected only by the assigned payoff schedule and the sampling history of that option – we can prevent the chooser from overexploiting particular alternatives. The differences between bandit-arm payoffs should not be so discriminable that the options with the highest probabilities are quickly identified and exclusively exploited, preventing the examination of learning Likewise, they should not be so difficult to discriminate that exploration remains very high throughout the task, and preference for the best option(s) is developed very slowly or not at all. The goal was to select payoff ratios that, when assigned to the eight options, allowed participants to demonstrate moderately paced learning of the prevailing contingencies

The Task Constraint
Method
Example Response Distribution
Results
Sensitivity to RRs
Reinforcement Earned
Discussion
Possible Reasons why Exploration in our Task was not Beneficial

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.