Abstract

Most of the existing saliency prediction research focuses on either single images or videos (or more precisely multiple images in sequence). However, to apply saliency prediction to drone exploration that has to consider multiple images from different view angles or localizations to determine the direction to explore, saliency prediction over multiple discontinuous images is required. In this paper, we propose a deep relative saliency model (MS-Net) for such an application. MS-Net starts with a single-image saliency feature extraction network for each image separately and then integrate these images by using a GCN-based mechanism called multi-image saliency fusion that learns relative saliency information among all the images. Finally, it predicts the saliency of each image by considering the relative information. Because there are no existing saliency prediction datasets with such multiple discontinuous images, we randomly cropped a large number of sub-images from 360° images of the existing 360° image saliency datasets to build our own dataset for both training and evaluation. Experimental results showed that the proposed MSNet considerably outperformed both single-image and video saliency prediction methods and could achieve comparative performance to that of 360° image saliency prediction even with only limited field-of-views, i.e., five sub-images, considered.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call