Abstract

In this paper, we study a Markov decision process with a non-linear discount function and with a Borel state space. We define a recursive discounted utility, which resembles non-additive utility functions considered in a number of models in economics. Non-additivity here follows from non-linearity of the discount function. Our study is complementary to the work of Jaśkiewicz et al. (Math Oper Res 38:108–121, 2013), where also non-linear discounting is used in the stochastic setting, but the expectation of utilities aggregated on the space of all histories of the process is applied leading to a non-stationary dynamic programming model. Our aim is to prove that in the recursive discounted utility case the Bellman equation has a solution and there exists an optimal stationary policy for the problem in the infinite time horizon. Our approach includes two cases: (a) when the one-stage utility is bounded on both sides by a weight function multiplied by some positive and negative constants, and (b) when the one-stage utility is unbounded from below.

Highlights

  • Discounting future benefits and costs is crucial in order to determine fair prices for investments and projects, in particular, when long time horizons come into play like for example in the task to price carbon emissions

  • We treat the model, where the positive and negative part of the one-stage utility is bounded by a weight function ω. We show in this case that the value function v∗ is a unique fixed point of the corresponding maximal reward operator and that every maximiser in the Bellman equation for v∗ defines an optimal stationary policy

  • Function ω one can study dynamic programming models with unbounded one-stage utility u. This method was introduced by Wessels [34], but as noted by van der Wal [32], in the dynamic programming with linear discount function δ(z) = βz, one can introduce an extra state xe ∈/ X, re-define the transition probability and the utility function to obtain an equivalent “bounded model”

Read more

Summary

Introduction

Discounting future benefits and costs is crucial in order to determine fair prices for investments and projects, in particular, when long time horizons come into play like for example in the task to price carbon emissions. We study a Markov decision process with a Borel state space, unbounded stage utility and with a non-linear discount function δ which has certain properties. 5.1 we treat the model, where the positive and negative part of the one-stage utility is bounded by a weight function ω. We show in this case that the value function v∗ is a unique fixed point of the corresponding maximal reward operator and that every maximiser in the Bellman equation for v∗ defines an optimal stationary policy. We discuss two different optimal growth models, an inventory problem and a stopping problem

The Dynamic Programming Model
Assumptions with Comments
Discounted Utility Evaluations
Auxiliary Results
One-Period Utilities with Bounds on Both Sides
One-Period Utilities Unbounded from Below
Policy Iteration
Howard’s Policy Improvement Algorithm
Applications
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call