A Sublinear-Regret Reinforcement Learning Algorithm on Constrained Markov Decision Processes with reset action

Takashi Watanabe,Takashi Sakuragawa

doi:10.1145/3380688.3380706

A Sublinear-Regret Reinforcement Learning Algorithm on Constrained Markov Decision Processes with reset action

Takashi Watanabe, Takashi Sakuragawa

https://doi.org/10.1145/3380688.3380706

Copy DOI

Publication Date: Jan 17, 2020

Affiliation: Kyoto University

#Reset Action #Constrained Markov Decision Processes + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

In this paper, we study model-based reinforcement learning in an unknown constrained Markov Decision Processes (CMDPs) with reset action. We propose an algorithm, Constrained-UCRL, which uses confidence interval like UCRL2, and solves linear programming problem to compute policy at the start of each episode. We show that Constrained-UCRL achieves sublinear regret bounds O(SA1/2T3/4) up to logarithmic factors with high probability for both the gain and the constraint violations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.