Better Be Lucky than Good: Exceeding Expectations in MDP Evaluation

Thomas Keller,Florian Geißer

doi:10.1609/aaai.v29i1.9698

Abstract

We introduce the MDP-Evaluation Stopping Problem, the optimization problem faced by participants of the International Probabilistic Planning Competition 2014 that focus on their own performance. It can be constructed as a meta-MDP where actions correspond to the application of a policy on a base-MDP, which is intractable in practice. Our theoretical analysis reveals that there are tractable special cases where the problem can be reduced to an optimal stopping problem. We derive approximate strategies of high quality by relaxing the general problem to an optimal stopping problem, and show both theoretically and experimentally that it not only pays off to pursue luck in the execution of the optimal policy, but that there are even cases where it is better to be lucky than good as the execution of a suboptimal base policy is part of an optimal strategy in the meta-MDP.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Better Be Lucky than Good: Exceeding Expectations in MDP Evaluation

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Mar 4, 2015
Citations: 2

Similar Papers

Optimal Multiple Stopping Approach to Mean Reversion Trading

-

01 Jan 2015
01 Jan 2015

Accurate solution of differential-algebraic optimization problems
Jeffery S Logsdon ... Lorenz T Biegler
Industrial & Engineering Chemistry Product Research and Development | VOL. 28
Jeffery S Logsdon, et. al.Jeffery S Logsdon ... Lorenz T Biegler
01 Nov 1989
Industrial & Engineering Chemistry Product Research and Development | VOL. 28

Approximate Dynamic Programming for Large Scale Systems

-

01 Jan 2012
01 Jan 2012

Sequential Joint Detection and Estimation: Optimal Procedures and Asymptotic Results

-

01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Better Be Lucky than Good: Exceeding Expectations in MDP Evaluation

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence