Abstract
We study the control of completely observed Markov chains with safety bounds as introduced by Arapostathis et al (2005), but with more general safety constraints and the added requirement of optimality. The safety bounds were specified as unit-interval valued vector pairs (lower and upper bounds for each component of the state probability distribution). In this paper we generalize the constraint set to be any linear convex set and present a way to compute a stationary control policy which is safe (i.e., maintains the safety of the distribution that is initially safe) and at the same time it is long-run average optimal. We propose a linear programming formulation for computing such a safe optimal policy. Under a simplifying assumption that the optimal policy is ergodic, we present a finitely-terminating iterative algorithm to compute the maximal invariant safe set (MISS) where the initial distribution must lie so that the future distributions always remain safe. Our approach allows us to calculate an upper bound for the number of iterations needed for the algorithm to terminate. In particular, for the two-state chains we show that at most one iteration is needed to compute the MISS.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.