In sensor management, the existing researches rely on traditional system modeling and strive to maximize the information superiority. In fact, on the one hand, complex environmental disturbance, incomplete information or uncooperative behavior in air combat missions often bring out unknown system evolution; on the other hand, to take full advantage of sensor effectiveness is of course essential, but more importantly, the detection security is the primary guarantee. This paper proposes the airborne sensor task assignment problem in unknown dynamic environments. Different from traditional methods that minimize the estimation error covariance or information entropy based on system dynamic model, our scheme needs to maximize agent survival while maintaining the necessary sensor detection without such model support. In assignment implementation, it is not straightforward to apply existing reinforcement learning methods, but design the state space and rewards ingeniously to meet the actual combat requirements. First, instead of selecting the locations of agents and targets as fundamental and infinite state variables, we consider the situation variables, such as target threat ranking together with cumulative radiation and information acquisition indication of sensors, which are all discrete state variables to reduce computational burden. Second, the reward structure is also designed based on the complex constraints of the mission, which is to encourage lower assignment risk and relatively full utilization of sensing, while penalizing too dangerous continuance assignment and inadequate assignment revenue. Simulations show that our proposed scheme achieves the desirable mission completion rate and the acceptable target tracking accuracy.
Read full abstract