The 2D image based salient object detection (SOD) has been extensively explored, while the 360° omnidirectional image based SOD has received less research attention and there exist three major bottlenecks that are limiting its performance. Firstly, the currently available training data is insufficient for the training of 360° SOD deep model. Secondly, the visual distortions in 360° omnidirectional images usually result in large feature gap between 360° images and 2D images; consequently, the widely used stage-wise training-a widely-used solution to alleviate the training data shortage problem, becomes infeasible when conducing SOD in 360° omnidirectional images. Thirdly, the existing 360° SOD approach has followed a multi-task methodology that performs salient object localization and segmentation-like saliency refinement at the same time, being faced with extremely large problem domain, making the training data shortage dilemma even worse. To tackle all these issues, this paper divides the 360° SOD into a multi-staqe task, the key rationale of which is to decompose the original complex problem domain into sequential easy sub problems that only demand for small-scale training data. Meanwhile, we learn how to rank the "object-level semantical saliency", aiming to locate salient viewpoints and objects accurately. Specifically, to alleviate the training data shortage problem, we have released a novel dataset named 360-SSOD, containing 1,105 360° omnidirectional images with manually annotated object-level saliency ground truth, whose semantical distribution is more balanced than that of the existing dataset. Also, we have compared the proposed method with 13 SOTA methods, and all quantitative results have demonstrated the performance superiority.
Read full abstract