Abstract

This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.

Highlights

  • This article addresses discounted Markov decision processes (MDPs) defined on Euclidean state and action spaces

  • This proposal is the first step to apply numerical methods based on Moreau-Yosida regularization, for example, gradient method or bundle methods, in the context of MDPs

  • These numerical methods should be applied to approximate the optimal policy of MDPs

Read more

Summary

Introduction

This article addresses discounted Markov decision processes (MDPs) defined on Euclidean state and action spaces (see [12] and [13]). For this kind of MDPs, this document proposes conditions on the components of the Markov control model, which guarantee the application of Moreau-Yosida regularization. It is worth mentioning that this type of regularization has not yet been applied to MDPs. The main idea of the methodology of the paper is the following: considering a discounted Markov decision process (named the original process or the original model) that satisfies certain assumptions, the Moreau-Yosida regularization is applied to its cost function, allowing the establishment of a new MDP, designated the perturbed MDP.

Moreau-Yosida Regularization
The Moreau-Yosida Regularization Applied to MDPs
Application to MDPs
Convexity and Differentiability in the Perturbed Model
Example
Final Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call