Abstract
Partially Observable Markov Decision Process (POMDP) is a fundamental model for probabilistic planning in stochastic domains. More recently, constrained POMDP and chance-constrained POMDP extend the model allowing constraints to be specified on some aspects of the policy in addition to the objective function. Despite their expressive power, these models assume all actions take a fixed duration, which poses a limitation in modeling real-world planning problems. In this work, we propose a unified model for durative POMDP and its constrained extensions. First, we convert these extensions into an Integer Linear Programming (ILP) formulation, which can be solved using existing solvers in the ILP literature. Second, a heuristic search approach is provided that can efficiently prune the search space, guided by solving successive partial ILP programs. Third, we give a theoretical analysis of the problem: unlike short-horizon POMDPs, with policies of a constant depth, which can be solved in polynomial time, the constrained extensions are NP-Hard even with a planning horizon of two and non-negative rewards. To alleviate that, we propose a Fully Polynomial Time Approximation Scheme (FPTAS) that computes (near) optimal deterministic policies in polynomial time. The FPTAS is among the best achievable in theory in terms of approximation ratio. Finally, evaluation results show that our approach is empirically superior to the state-of-the-art fixed-horizon chance-constrained POMDP solver.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.