Stochastic Processes for Return Maximization in Reinforcement Learning

Kazunori Iwata,Kazushi Ikeda,Hideaki Sakai

doi:10.1007/11550907_34

Stochastic Processes for Return Maximization in Reinforcement Learning

Kazunori Iwata, Kazushi Ikeda + Show 1 more

https://doi.org/10.1007/11550907_34

Copy DOI

Publication Date: Jan 1, 2005

Citations: 6

Affiliation: Kyoto College of Graduate Studies for Informatics, Kyoto University

#Return Maximization #Ergodic Markov Decision Process + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In the framework of reinforcement learning, an agent learns an optimal policy via return maximization, not via the instructed choices by a supervisor. The framework is in general formulated as an ergodic Markov decision process and is designed by tuning some parameters of the action-selection strategy so that the learning process eventually becomes almost stationary. In this paper, we examine a theoretical class of more general processes such that the agent can achieve return maximization by considering the asymptotic equipartition property of such processes. As a result, we show several necessary conditions that the agent and the environment have to satisfy for possible return maximization.

Full Text