Abstract

A target moves among a finite number of cells according to a discrete-time homogeneous Markov chain. The searcher is subject to constraints on the search path, i.e., the cells available for search in the current epoch is a function of the cell searched in the previous epoch. The aim is to identify a search policy that maximizes the infinite-horizon total expected reward earned. We show the following structural results under the assumption that the target transition matrix is ergodic: 1) the optimal search policy is stationary; and 2) there exists /spl epsi/-optimal stationary policies which may be constructed by the standard value iteration algorithm in finite time. These results are obtained by showing that the dynamic programming operator associated with the search problem is an m-stage contraction mapping on a suitably defined space. An upper bound of m and the coefficient of contraction /spl alpha/ is given in terms of the transition matrix and other variables pertaining to the search problem. These bounds on m and /spl alpha/ may be used to derive bounds on suboptimal search polices constructed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call