Abstract

A deep feature-based saliency model (DeepFeat) is developed to leverage understanding of the prediction of human fixations. Conventional saliency models often predict the human visual attention relying on few image cues. Although such models predict fixations on a variety of image complexities, their approaches are limited to the incorporated features. In this paper, we aim to utilize the deep features of convolutional neural networks by combining bottom-up (BU) and top-down (TD) saliency maps. The proposed framework is applied on deep features of three popular deep convolutional neural networks (DCNNs). We exploit four evaluation metrics to evaluate the correspondence between the proposed saliency model and the ground-truth fixations over two datasets. The results demonstrate that the deep features of pretrained DCNNs over the ImageNet dataset are strong predictors of the human fixations. The incorporation of BU and TD saliency maps outperforms the individual BU or TD implementations. Moreover, in comparison to nine saliency models, including four state-of-the-art and five conventional saliency models, our proposed DeepFeat model outperforms the conventional saliency models over all four evaluation metrics.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.