AbstractHow humans resolve the explore–exploit dilemma in complex environments is an important open question. Previous studies suggested that environmental richness may affect the degree of exploration in a type-specific manner and reduce random exploration while increasing uncertainty-based exploration. Our study examined this possibility by extending a recently developed two-armed bandit task that can dissociate the uncertainty and novelty of stimuli. To extract the pure effect of environmental richness, we manipulated the reward by its magnitude, not its probability, across blocks because reward probability affects outcome controllability. Participants (N = 198) demonstrated increased optimal choices when the relative reward magnitude was higher. A behavioral analysis with computational modeling revealed that a higher reward magnitude reduced the degree of random exploration but had little effect on the degree of uncertainty- and novelty-based exploration. These results suggest that humans modulate their degree of random exploration depending on the relative level of environmental richness. Combined with findings from previous studies, our findings indicate the possibility that outcome controllability also influences the exploration–exploitation balance in human reinforcement learning.
Read full abstract