Abstract

We consider a task prioritization problem of an active sonar tracking system when available ping resources may not be sufficient to sustain all tracking tasks at any particular time. In this problem, the time-varying conditions of a tracking task are represented by a finite-state discrete-time Markov decision process. The objective is to find a policy which decides at each time interval which tracking tasks to perform so as to maximize the aggregate reward over time. This paper addresses the derivation of the Markov chain parameters from the sonar tracking system simulations, the establishment of task prioritization as a restless bandit (TPRB) problem, and the TPRB policy obtained by a primal-dual index heuristic based on a first-order linear programming relaxation to the TPRB problem. The superior performance of the resulting TPRB policy is demonstrated using Monte Carlo simulations on various multi-target scenarios.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call