Video saliency detection, which suffers from the interference of complicated motion and complex background, aims at discovering the motion-related and the most noticeable object in a video sequence while maintaining the spatiotemporal consistency of saliency maps. We propose a video saliency detection approach by constructing the moving object properties, meaning that motion is in concurrence with a general object for detected regions. With the assistance of different key frame strategies, the consistency propagation, which is respectively conducted to refine the obtained results of motion saliency and object proposals to improve low-quality detections and avoid the excessive accumulation of errors, is orderly implemented through sparse reconstruction based on the constructed adjacent relationships. Meanwhile, by integrating the refined object proposals with the refined salient motion detection, a pixel-based fusion strategy is applied to locate the most trustworthy motion object regions and suppress detection noises, such as dynamic background and stationary objects. Moreover, a Bayesian fusion framework that incorporates the global features obtained using low-rank constraints is employed to further enhance the accuracy and global temporal consistency for the obtained initial motion object regions. Then based on the obtained priors of motion objects and background regions, the spatial saliency map is estimated using the geodesic distances transform to discover more complete and spatially salient motion objects. Finally, an energy optimization function is proposed to intuitively integrate multiple saliency clues (i.e., spatial saliency, global features, and motion objects) and generate the global spatiotemporal consistency saliency map. Experimental results on three different benchmark datasets demonstrate that the proposed method successfully infers video moving objects and extracts the most salient regions, outperforming the state-of-the-art methods.
Read full abstract