Abstract

Visual saliency prediction plays an important role in Unmanned Aerial Vehicle (UAV) video analysis tasks. In this paper, an efficient saliency prediction model for UAV video is proposed based on spatial–temporal features, prior information and the relationship of frames. It can achieve high efficiency by designing a simplified network model. Since UAV videos usually cover a wide range of scenes containing various background disturbances, a cascading architecture module is proposed for feature extraction from coarse to fine, in which a saliency related feature sub-network is utilized to obtain useful clues from each frame, then a new convolution block is designed to capture spatial–temporal features. This structure can achieve advanced performance and high speed based on a 2D CNN framework. Moreover, a multi-stream prior module is proposed to model the bias phenomenon in viewing behavior for UAV video scenes. It can automatically learn prior information based on the video context, and can also combine other priors. Finally, based on the spatial–temporal features and learned priors, a temporal weighted average module is proposed to model the inter-frame relationship and generate the final saliency map, which can make the generated saliency maps look smoother in the temporal dimension. The proposed method is compared with 17 state-of-the-art models on two public UAV video saliency prediction datasets. The experimental results demonstrate that our model outperforms other competitors. Source code is available at: https://github.com/zhangkao/IIP_UAVSal_Saliency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call