An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions.

Yao Ma,Tingting Zhao,Kohei Hatano,Masashi Sugiyama

doi:10.1162/neco_a_00808

An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions.

Yao Ma, Tingting Zhao + Show 2 more

Open Access

https://doi.org/10.1162/neco_a_00808

Copy DOI

Journal: Neural computation	Publication Date: Mar 1, 2016
Citations: 4

Affiliation: Tokyo Institute of Technology, Tianjin University of Science and Technology, Kyushu University, The University of Tokyo

#Markov Decision Process #Algorithm For Markov Decision Processes + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We consider the learning problem under an online Markov decision process (MDP) aimed at learning the time-dependent decision-making policy of an agent that minimizes the regret-the difference from the best fixed policy. The difficulty of online MDP learning is that the reward function changes over time. In this letter, we show that a simple online policy gradient algorithm achieves regret O(√T) for T steps under a certain concavity assumption and O(log T) under a strong concavity assumption. To the best of our knowledge, this is the first work to present an online MDP algorithm that can handle continuous state, action, and parameter spaces with guarantee. We also illustrate the behavior of the proposed online policy gradient method through experiments.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Neural computation

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.