Abstract

This paper solves the two-armed bandit problem when decision makersare risk averse. It shows, counterintuitively, that a more risk-averse decisionmaker might be more willing to take risky actions. The reason relates tothe fact that pulling the risky arm in bandit models produces informationon the environment – thereby reducing the risk that a decision maker willface in the future. This finding gives reason for caution when inferring riskpreferences from observed actions: in a bandit setup, observing a greaterappetite for risky actions can actually be indicative of more risk aversion,not less. Studies which do not take this into account may produce biasedestimates.

Highlights

  • This paper analyzes how a rational, risk-averse decision maker solves the two-armed bandit problem of having to choose between a safe alternative that yields a known reward, and a risky one that generates an unknown payoff.At first sight, it seems intuitive that decision makers who are more risk averse will be less willing to take the risky action

  • The point of this paper is to show that the introduction of learning and experimentation can overturn this wisdom: appointing a more risk-averse decision maker is no guarantee for the implementation of less risky actions

  • The counterintuitive part (b) of Proposition 1 is seemingly at odds with the result of Chancelier et al (2009), who conclude that more risk-averse DMs are always more likely to pull the safe arm in bandit problems

Read more

Summary

Introduction

This paper analyzes how a rational, risk-averse decision maker solves the two-armed bandit problem of having to choose between a safe alternative that yields a known reward, and a risky one that generates an unknown payoff. We uncover the previously overlooked result that a more risk-averse decision maker might be more willing to pull the risky arm than a less risk-averse colleague. The reason for this counterintuitive result relates to the notion that risk in bandit models can be reduced through experimentation with the risky arm. The point of this paper is to show that the introduction of learning and experimentation can overturn this wisdom: appointing a more risk-averse decision maker is no guarantee for the implementation of less risky actions. Studies which do not take this into account may produce biased estimates

A bandit model with non-linear utility
The effects of risk aversion
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.