Based on transfer learning, feature maps of deep convolutional neural networks (DCNNs) have been used to predict human visual attention. In this paper, we conduct extensive comparisons to investigate effects of feature maps on the predictions of the human visual attention using a deep features based saliency model framework. The feature maps of seven pretrained DCNNs are investigated using classical and class activation maps approaches. The performances of various saliency implementations are evaluated over four datasets using three metrics. The results demonstrate that deep feature maps of the pretrained DCNNs can be used to create saliency maps for the prediction of human visual attention. The incorporation of multiple levels of blurred and multi-scale feature maps improves the extraction of salient regions. Moreover, DCNNs pretrained using the Places dataset provide more localized objects that can be beneficial to the top-down saliency maps.
Read full abstract