Abstract

This paper considers the problem of deploying a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through the regions of the environment as a Markov decision process (MDP). The robot control problem becomes finding the control policy which maximizes the probability of satisfying the temporal logic task on the MDP. For a large environment, obtaining transition probabilities for each state–action pair, as well as solving the necessary optimization problem for the optimal policy, are computationally intensive. To address these issues, we propose an approximate dynamic programming framework based on a least-squares temporal difference learning method of the actor–critic type. This framework operates on sample paths of the robot and optimizes a randomized control policy with respect to a small set of parameters. The transition probabilities are obtained only when needed. Simulations confirm that convergence of the parameters translates to an approximately optimal policy.

Highlights

  • One major goal in robot motion planning and control is to specify a mission task in an expressive and high-level language and convert the task automatically to a control strategy for the robot

  • This paper extends from [19], in which we proposed an actor-critic method for maximal reachability (MRP) problems, i.e., maximizing the probability of reaching a set of states, to a computational framework that finds a control policy such that the probability of its paths satisfying an arbitrary Linear Temporal Logic (LTL) formula is locally optimal over a set of parameters

  • We presented a framework that brings together an approximate dynamic programming computational method of the actor critic type, with formal control synthesis for Markov Decision Processes (MDPs) from temporal logic specifications

Read more

Summary

INTRODUCTION

One major goal in robot motion planning and control is to specify a mission task in an expressive and high-level language and convert the task automatically to a control strategy for the robot. This paper extends from [19], in which we proposed an actor-critic method for maximal reachability (MRP) problems, i.e., maximizing the probability of reaching a set of states, to a computational framework that finds a control policy such that the probability of its paths satisfying an arbitrary LTL formula is locally optimal over a set of parameters. This set of parameters is designed to tailor to this class of approximate dynamical programming problems. Transpose of a vector x is denoted by xT. · stands for the Euclidean norm. |S| denotes the cardinality of a set S

PROBLEM FORMULATION AND APPROACH
Formulation of the MRP Problem
LSTD Actor-Critic Method
1: Initialization
Designing an RSP
Overall Algorithm
HARDWARE-IN-THE-LOOP SIMULATION
Environment
Construction of the MDP model
Task specification and results
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call