The multi-UAV target search problem is crucial in the field of autonomous Unmanned Aerial Vehicle (UAV) decision-making. The algorithm design of Multi-Agent Reinforcement Learning (MARL) methods has become integral to research on multi-UAV target search owing to its adaptability to the rapid online decision-making required by UAVs in complex, uncertain environments. In non-cooperative target search scenarios, targets may have the ability to escape. Target probability maps are used in many studies to characterize the likelihood of a target’s existence, guiding the UAV to efficiently explore the task area and locate the target more quickly. However, the escape behavior of the target causes the target probability map to deviate from the actual target’s position, thereby reducing its effectiveness in measuring the target’s probability of existence and diminishing the efficiency of the UAV search. This paper investigates the multi-UAV target search problem in scenarios involving static obstacles and dynamic escape targets, modeling the problem within the framework of decentralized partially observable Markov decision process. Based on this model, a spatio-temporal efficient exploration network and a global convolutional local ascent mechanism are proposed. Subsequently, we introduce a multi-UAV Escape Target Search algorithm based on MAPPO (ETS–MAPPO) for addressing the escape target search difficulty problem. Simulation results demonstrate that the ETS–MAPPO algorithm outperforms five classic MARL algorithms in terms of the number of target searches, area coverage rate, and other metrics.