Abstract

This thesis investigates risk-sensitive sequential decision-making problems in an uncertain environment. We rst introduce the axiomatic concept of valuation functions that generalize known concepts of risk measures in mathematical nance to cover most of the existing risk related models in various elds, in particular, behavioral economics and cognitive neuroscience. By applying this concept to Markov processes, we construct valuation maps and develop thereby a uni ed framework for incorporating risk into Markov decision processes on general spaces. Within the framework, we study mainly two types of in nite-horizon risk-sensitive criteria, discounted and average valuations, and solve the associated optimization problems by value iteration. For the discounted case, we propose a new discount scheme, which is di erent from the conventional form but consistent with existing literature, while for the average criterion, we state Lyapunov-type stability conditions that generalize known conditions for Markov chains to ensure the existence of solutions to the optimality equation and a geometric convergence rate for the value iteration. Applying a set of valuation functions, called utility-based shortfall, we derive a family of model-free risk-sensitive reinforcement learning algorithms for solving the optimization problems corresponding to risk-sensitive valuations. In addition, we nd that when appropriate utility functions are chosen, agents’ behaviors express key features of human behavior as predicted by prospect theory, for example, di erent risk preferences for gains and losses, as well as the shape of subjective probability curves. As a proof of principle for the applicability of the new algorithms, we apply them to two tasks, 1) to quantify human behavior in a sequential investment task and 2) to perform risk control in simulated algorithmic trading of stocks. In the rst task, the risk-sensitive variant provides a signi cantly better t to the behavioral data and it leads to an interpretation of the subject’s responses which is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals show a signi cant correlation of the risk-sensitive temporal di erence error with BOLD signal change in the ventral striatum. In the second task, our algorithm outperforms the risk-neutral reinforcement learning algorithm by keeping the trading cost at a substantially low level at the spot when the 2010 Flash Crash happened, and signi cantly reducing the risk over the whole test period.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.