Abstract

Animals, humans, and organizations are known to adjust how (much) they explore complex environments that exceed their information processing capacity, rather than relentlessly search for the optimal action. The adjusted depth of exploration is supposed to depend on the aspiration level internal to the agent. This action selection tendency is known as satisficing. The Risk-sensitive Satisficing (RS) model implements satisficing in the reinforcement learning framework through conversion of action values into gains (or losses) relative to the aspiration level. The risk-sensitive evaluation of action values by RS has been shown to be effective in reinforcement learning. In this paper, first we analyze RS in comparison with UCB and Thompson sampling algorithms. We also show that RS shows differential risk-attitudes considering the risks. Then we propose the Softsatisficing policy that is a stochastic equivalent of RS and further analyze the exploratory behavior of risk-sensitive satisficing that RS and Softsatisficing implement. We emphasize that Softsatisficing has the potential of modeling risk-sensitive foraging and other decision-making behaviors by humans, animals, and organizations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call