Autonomous security unmanned aerial vehicles (UAVs) have recently gained popularity as an effective solution for accomplishing target/intrusion detection and tracking tasks with little or no human intervention. In this context, we aim at developing an autonomous UAV system for detecting a dynamic and uncertain intrusion within an area, in which the intruder/target moves from one location to another within the area according to an unknown random distribution. The problem of finding an uncertain target while considering the energy causality constraint of the UAV’s battery and the uncertainty of the target movement is mathematically formulated as a search benefit maximization problem, which cannot be directly optimized due to the uncertain unknown target movement. Thus, we reformulate the optimization problem as a Markov-decision process that can be solved using reinforcement learning (RL) techniques. Then, we implement an RL-based algorithm to solve the reformulated benefit maximization problem by enabling the UAV to autonomously learn the dynamics of the intruder/target. Specifically, different design variants of the RL-based algorithm are implemented that differ in the used temporal difference methods (i.e., Q-learning or state-action-reward-state-action), and in the exploration algorithms (convergence-based or <formula><tex>$\epsilon$</tex></formula>-greedy). Simulation results show RL algorithms’ superiority and effectiveness over existing random and circular target detection algorithms.
Read full abstract