Model-based motion planning in POMDPs with temporal logic specifications

Junchao Li,Mingyu Cai,Zhaoan Wang,Shaoping Xiao

doi:10.1080/01691864.2023.2226191

Abstract

Partially observable Markov decision processes (POMDPs) have been used as mathematical models for sequential decision-making under uncertain and incomplete information. Since the state space is partially observable in a POMDP, the agent has to make a decision based on the integrated information over the past experiences of actions and observations. This study aims to solve probabilistic motion planning problems in which the agent is assigned a complex task under a partially observable environment. We employ linear temporal logic (LTL) to formulate the complex task and then convert it to a limit-deterministic generalized Büchi automaton (LDGBA). We reformulate the problem as finding an optimal policy on the product of POMDP and LDGBA based on model-checking techniques. This paper adopts and modifies two reinforcement learning (RL) approaches: value iteration and deep Q-learning. Both are model-based because the optimal policy is a function of belief states that need transition and observation probabilities to be updated. We illustrate the applicability of the proposed methods by addressing two simulations, including a grid-world problem with various sizes and a TurtleBot office path planning problem.

Full Text