Safe Optimal Design with Applications in Policy Learning

Ruihao Zhu,Branislav Kveton

doi:10.2139/ssrn.3959086

Abstract

Motivated by practical needs in online experimentation and off-policy learning, we study the problem of safe optimal design, where we develop a data logging policy that efficiently explores while achieving competitive rewards with a baseline production policy. We first show, perhaps surprisingly, that a common practice of mixing the production policy with uniform exploration, despite being safe, is sub-optimal in maximizing information gain. Then we propose a safe optimal logging policy for the case when no side information about the actions’ expected rewards is available. We improve upon this design by considering side information and also extend both approaches to a large number of actions with a linear reward model. We analyze how our data logging policies impact errors in off-policy learning. Finally, we empirically validate the benefit of our designs by conducting extensive experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Safe Optimal Design with Applications in Policy Learning

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal

Lead the way for us

Similar Papers

Learning With Options That Terminate Off-Policy
Anna Harutyunyan ... Doina Precup
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 32
Anna Harutyunyan, et. al.Anna Harutyunyan ... Doina Precup
29 Apr 2018
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 32

Optimal Insurance for a Minimal Expected Retention: The Case of an Ambiguity-Seeking Insurer
Massimiliano Amarante ... Mario Ghossoub
SSRN Electronic Journal | VOL. -
Massimiliano Amarante, et. al.Massimiliano Amarante ... Mario Ghossoub
27 Jan 2015
SSRN Electronic Journal | VOL. -

Optimal Design and Identification Problems in Nonsmooth Mechanics
Georgios E. Stavroulakis
-
Georgios E. StavroulakisGeorgios E. Stavroulakis
01 Jan 2001
01 Jan 2001

Off-policy learning in large-scale POMDP-based dialogue systems
Lucie Daubigney ... Matthieu Geist
-
Lucie Daubigney, et. al.Lucie Daubigney ... Matthieu Geist
01 Mar 2012
01 Mar 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Safe Optimal Design with Applications in Policy Learning

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal