Abstract
We present a method to find a cost-optimal policy for a given finite-horizon Markov decision process (MDP) with unknown transition probability, such that the probability of satisfying a given signal temporal logic specification is above a desired threshold. We propose an augmentation of the MDP state space to enable the expression of the STL objective as a reachability objective. In this augmented space, the optimal policy problem is re-formulated as a finite-horizon constrained Markov decision process (CMDP). We then develop a model-free reinforcement learning (RL) scheme to provide an approximately optimal policy for any general finite horizon CMDP problem. This scheme can make use of any off-the-shelf model-free RL algorithm and considers the general space of non-stationary randomized policies. Finally, we illustrate the applicability of our RL-based approach through two case studies.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.