Attention Embedded Spatio-Temporal Network for Video Salient Object Detection

Lili Huang,Pengxiang Yan,Liang Lin,Guanbin Li,Qing Wang

doi:10.1109/access.2019.2953046

Lili Huang, Pengxiang Yan + Show 3 more

Open Access

PDF Available

https://doi.org/10.1109/access.2019.2953046

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 9	License type: CC BY 4.0

Affiliation: Sun Yat-sen University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The main challenge in video salient object detection is how to model object motion and dramatic changes in appearance contrast. In this work, we propose an attention embedded spatio-temporal network (ASTN) to adaptively exploit diverse factors that influence dynamic saliency prediction within a unified framework. To compensate for object movement, we introduce a flow-guided spatial learning (FGSL) module to directly capture effective motion information in the form of attention based on optical flows. However, optical flow represents the motion information of all moving objects, including movements of non-salient objects caused by large camera motion and subtle changes in background. Therefore, using the flow-guided attention map alone causes the spatial saliency to be influenced by all moving objects rather than just the salient objects, resulting in unstable and temporally inconsistent saliency maps. To further enhance the temporal coherence, we develop an attentive bidirectional gated recurrent unit (AB-GRU) module to adaptively exploit sequential feature evolution. With this AB-GRU, we can further refine the spatiotemporal feature representation by incorporating an accommodative attention mechanism. Experimental results demonstrate that our model achieves superior empirical performance on video salient object detection. Moreover, an experiment on the extended application to unsupervised video object segmentation further demonstrates the generalization ability and stability of our proposed method.

Highlights

Video salient object detection aims to continuously identify the motion-related salient objects that most strongly attract human attention in video sequences
To further enhance the temporal coherence, we develop an attentive bidirectional gated recurrent unit (AB-GRU) module to adaptively exploit the evolution of sequential features
We develop an attentive bidirectional gated recurrent unit (AB-GRU) module to further enhance the temporal coherence by adaptively exploiting the evolution of sequential features with the attention mechanism for spatio-temporal information modeling

Summary

Introduction

Video salient object detection aims to continuously identify the motion-related salient objects that most strongly attract human attention in video sequences. This task is often used as a preprocessing step in many computer vision applications, such as video tracking, person re-identification, and video compression. Direct application of these static saliency detection strategies to dynamic video salient object detection is inappropriate because consecutive video frames involve more sophisticated appearance information and continuous motion cues that cannot be effectively modeled via static saliency strategies. Two camels appear in the 45th frame, both of which are detected as salient objects through a static image saliency model (the last column in Figure 1 shows the results predicted by the state-of-the-art image saliency model DHS [3]).

Objectives

Results

Conclusion