Abstract

The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run cost objective such as the infinite horizon discounted or average cost. In many practical applications, optimizing the expected value alone is not sufficient and it may be necessary to include a risk measure in the optimization process either as the objective or a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance tradeoff, exponential utility, the percentile performance, value at risk, conditional value at risk, prospect theory and its later enhancement, cumulative prospect theory. In this two-part article, we are primarily concerned with the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that minimizes the usual objective of infinite horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. In this part, we introduce the risk-constrained RL framework, cover popular risk measures based on variance, conditional value-at-risk and cumulative prospect theory and present a template for a risk-sensitive RL algorithm. In a companion paper, we survey some of our recent works on this topic, covering problems encompassing discounted, average and stochastic shortest path settings, together with the aforementioned risk measures in a constrained framework. This non-exhaustive survey is aimed at giving a flavor of the challenges involved in solving a risk-sensitive RL problem, and outlining a few challenging future research directions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call