Abstract

The main challenge in video salient object detection is how to model object motion and dramatic changes in appearance contrast. In this work, we propose an attention embedded spatio-temporal network (ASTN) to adaptively exploit diverse factors that influence dynamic saliency prediction within a unified framework. To compensate for object movement, we introduce a flow-guided spatial learning (FGSL) module to directly capture effective motion information in the form of attention based on optical flows. However, optical flow represents the motion information of all moving objects, including movements of non-salient objects caused by large camera motion and subtle changes in background. Therefore, using the flow-guided attention map alone causes the spatial saliency to be influenced by all moving objects rather than just the salient objects, resulting in unstable and temporally inconsistent saliency maps. To further enhance the temporal coherence, we develop an attentive bidirectional gated recurrent unit (AB-GRU) module to adaptively exploit sequential feature evolution. With this AB-GRU, we can further refine the spatiotemporal feature representation by incorporating an accommodative attention mechanism. Experimental results demonstrate that our model achieves superior empirical performance on video salient object detection. Moreover, an experiment on the extended application to unsupervised video object segmentation further demonstrates the generalization ability and stability of our proposed method.

Highlights

  • Video salient object detection aims to continuously identify the motion-related salient objects that most strongly attract human attention in video sequences

  • To further enhance the temporal coherence, we develop an attentive bidirectional gated recurrent unit (AB-GRU) module to adaptively exploit the evolution of sequential features

  • We develop an attentive bidirectional gated recurrent unit (AB-GRU) module to further enhance the temporal coherence by adaptively exploiting the evolution of sequential features with the attention mechanism for spatio-temporal information modeling

Read more

Summary

Introduction

Video salient object detection aims to continuously identify the motion-related salient objects that most strongly attract human attention in video sequences. This task is often used as a preprocessing step in many computer vision applications, such as video tracking, person re-identification, and video compression. Direct application of these static saliency detection strategies to dynamic video salient object detection is inappropriate because consecutive video frames involve more sophisticated appearance information and continuous motion cues that cannot be effectively modeled via static saliency strategies. Two camels appear in the 45th frame, both of which are detected as salient objects through a static image saliency model (the last column in Figure 1 shows the results predicted by the state-of-the-art image saliency model DHS [3]).

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.