Reliable Off-Policy Evaluation for Reinforcement Learning

Jie Wang,Rui Gao,Hongyuan Zha

doi:10.1287/opre.2022.2382

Reliable Off-Policy Evaluation for Reinforcement Learning

Jie Wang, Rui Gao + Show 1 more

Open Access

https://doi.org/10.1287/opre.2022.2382

Copy DOI

Journal: Operations Research	Publication Date: Oct 11, 2022
Citations: 6

Affiliation: Chinese University of Hong Kong, The University of Texas at Austin, Chinese University of Hong Kong, Shenzhen

#Off-Policy Evaluation #Reinforcement Learning + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Off-policy evaluation is an important topic in reinforcement learning, which estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy. It is imperative to quantify the uncertainty of the off-policy estimate before deployment of the target policy. Here we leverage methodologies from (Wasserstein) distributionally robust optimization to provide robust and optimistic cumulative reward estimates. With proper selection of the size of the distributional uncertainty set, these estimates serve as confidence bounds with nonasymptotic and asymptotic guarantees under stochastic or adversarial environments. We also generalize those results to batch reinforcement learning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Operations Research

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.