APPLYING REINFORCEMENT LEARNING TO THE WEAPON ASSIGNMENT PROBLEM IN AIR DEFENCE

Hildegarde Mouton,Herman Le Roux,Jan Roodt

doi:10.5787/39-2-115

Abstract

The modern battlefield is a fast-paced, information-rich environment, where discovery of intent, situation awareness and the rapid evolution of concepts of operation and doctrine are critical success factors. A combination of the techniques investigated and tested in this work, together with other techniques in Artificial Intelligence (AI) and modern computational techniques, may hold the key to relieving the burden of the decision-maker and aiding in better decision-making under pressure. The techniques investigated in this article were two methods from the machine-learning subfield of reinforcement learning (RL), namely a Monte Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy temporal-difference (TD) learning-control algorithm, Q-learning. These techniques were applied to a simplified version of the weapon assignment (WA) problem in air defence. The MCES control algorithm yielded promising results when searching for an optimal shooting order. A greedy approach was taken in the Q-learning algorithm, but experimentation showed that the MCES-control algorithm still performed significantly better than the Q-learning algorithm, even though it was slower.

Full Text