Abstract
Sequential decision-making problems arise in every arena of daily life and pose unique challenges for research in decision-theoretic planning. Although there has been a wide variety of research in this field, most of the studies have largely focused on single objective problem without constraints. In many real-world applications, however, it is often desirable to bound certain costs or resources under some predefined level. Constrained stochastic shortest path problem (C-SSP), one of the most well-known mathematical frameworks for stochastic decision-making problems with constraints, can formally model such problems, by incorporating constraints in the model formulation. However, it remains an open challenge to produce a deterministic optimal policy with desirable computation time due to its intrinsic complexity.In this paper, we propose a method that produces an optimal and deterministic policy for a C-SSP based on the Lagrangian duality theory and the heuristic forward search method. To address the intrinsic complexity of C-SSP, the proposed method is designed to have an anytime property. In other words, the proposed algorithm tries to find a feasible but decent solution quickly, then improves the solution incrementally until it converges to a true optimal solution. An extensive experimental evaluation on three problem domains shows that the proposed method outperforms the state-of-the-art methods in terms of the near-optimal solution with an optimality gap of less than 0.1%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.