This paper introduces a novel video saliency model for salient object detection in videos. Firstly, we generate multi-level deep features via a symmetrical convolutional neural network, in which the inputs are the current frame and the optical flow image. Then, the multi-level deep features are integrated in a hierarchical manner using a fusion network, which deploys attention module to make a selection for deep features. Lastly, the integrated deep feature is combined with the boundary information originated from shallow layer of the feature extraction networks, and the saliency map is generated by the saliency prediction step. The key advantages of our model lie on the attention module, the hierarchical integration and the boundary information, in which the former one acts as weight filter and is used to select the most salient regions in deep features, the middle one gives an effective integration manner for features from different layers and the last one provides well-defined boundaries for saliency map. Extensive experiments are performed on two challenging video dataset, and the experimental results show that our model consistently outperforms the state-of-the-art saliency models in a large margin.
Read full abstract