Abstract

In this paper, we study the nonasymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes (MDPs), where the optimal robust policy and value function are estimated from a generative model. While prior work focusing on nonasymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and (s,a)-rectangular assumption, we improve their results and also consider other uncertainty sets, including the L1 and χ2 balls. Our results show that when we assume (s,a)-rectangular on uncertainty sets, the sample complexity is about O˜(|S|2|A| ε2ρ2(1−γ)4). In addition, we extend our results from the (s,a)-rectangular assumption to the s-rectangular assumption. In this scenario, the sample complexity varies with the choice of uncertainty sets and is generally larger than the case under the (s,a)-rectangular assumption. Moreover, we also show that the optimal robust value function is asymptotically normal with a typical rate n under the (s,a) and s-rectangular assumptions from both theoretical and empirical perspectives.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.