Planning under time constraints in stochastic domains

Thomas Dean,Leslie Pack Kaelbling,Jak Kirman,Ann Nicholson

doi:10.1016/0004-3702(94)00086-g

Thomas Dean, Leslie Pack Kaelbling + Show 2 more

Open Access

https://doi.org/10.1016/0004-3702(94)00086-g

Copy DOI

Journal: Artificial Intelligence	Publication Date: Jul 1, 1995
Citations: 202	License type: elsevier-specific: oa user license

Affiliation: Brown University

Abstract

We provide a method, based on the theory of Markov decision processes, for efficient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must find a policy (mapping from states to actions) that maximizes future rewards. Standard goals of achievement, as well as goals of maintenance and prioritized combinations of goals, can be specified in this way. An optimal policy can be found using existing methods, but these methods require time at best polynomial in the number of states in the domain, where the number of states is exponential in the number of propositions (or state variables). By using information about the starting state, the reward function, and the transition probabilities of the domain, we restrict the planner's attention to a set of world states that are likely to be encountered in satisfying the goal. Using this restricted set of states, the planner can generate more or less complete plans depending on the time it has available. Our approach employs several iterative refinement routines for solving different aspects of the decision making problem. We describe the meta-level control problem of deliberation scheduling, allocating computational resources to these routines. We provide different models corresponding to optimization problems that capture the different circumstances and computational strategies for decision making under time constraints. We consider precursor models in which all decision making is performed prior to execution and recurrent models in which decision making is performed in parallel with execution, accounting for the states observed during execution and anticipating future states. We describe experimental results for both the precursor and recurrent problems that demonstrate planning times that grow slowly as a function of domain size and compare their performance to other relevant algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Planning under time constraints in stochastic domains

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence

Lead the way for us

Similar Papers

Comparative effectiveness research on patients with acute ischemic stroke using Markov decision processes
Darong Wu ... Yuanqi Zhao
BMC Medical Research Methodology | VOL. 12
Darong Wu, et. al.Darong Wu ... Yuanqi Zhao
09 Mar 2012
BMC Medical Research Methodology | VOL. 12

A state action frequency approach to throughput maximization over uncertain wireless channels
Krishna Jagannathan ... Eytan Modiano
-
Krishna Jagannathan, et. al.Krishna Jagannathan ... Eytan Modiano
01 Apr 2011
01 Apr 2011

Dynamic non-uniform abstractions for approximate planning in large structured stochastic domains
J Baum ... A E Nicholson
-
J Baum, et. al.J Baum ... A E Nicholson
01 Jan 1998
01 Jan 1998

CAC and routing for multi‐service networks with blocked wide‐band calls delayed, part I: exact link MDP framework
Ernst Nordström ... Zbigniew Dziong
European Transactions on Telecommunications | VOL. 17
Ernst Nordström, et. al.Ernst Nordström ... Zbigniew Dziong
05 Jul 2005
European Transactions on Telecommunications | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Planning under time constraints in stochastic domains

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence