Abstract

POMDP is considered as a basic model for decision making under uncertainty. As a generalization of the exact POMDP model, the bounded-parameter POMDP (BPOMDP) provides only upper and lower bounds on the state-transition probabilities, observation probabilities and rewards, which is particularly suitable for characterizing the situations where the underlying model is imprecisely given or time-varying. This paper presents the optimistic criterion for optimality for solving BPOMDPs, under which the optimistically optimal value function is defined. By representing a policy explicitly as a finite-state controller, we propose a policy iteration approach that is shown to converge to an $$\epsilon$$ -optimal policy under the optimistic optimality criterion.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.