Towards explainable deep visual saliency models

Sai Phani Kumar Malladi,Jayanta Mukherjee,Mohamed-Chaker Larabi,Santanu Chaudhury

doi:10.1016/j.cviu.2023.103782

Abstract

Deep neural networks have shown their profound impact on achieving human-level performance in visual saliency prediction. However, it is still unclear how they learn their task and what it means in terms of understanding human visual system. In this work, we propose a framework to derive explainable saliency models from their corresponding deep architectures. Mainly, we explain a deep saliency model by understanding its four different aspects: (1) intermediate activation maps of deep layers, (2) biologically plausible Log-Gabor (LG) filters for salient region identification, (3) positional biased behavior of Log-Gabor filters and (4) processing of color information by establishing a relevance with human visual system. We consider four state-of-the-art (SOTA) deep saliency models, namely CMRNet, UNISAL, DeepGaze IIE, and MSI-Net for their interpretation using our proposed framework. We observe that explainable models perform way better than the classical SOTA models. We also find that CMRNet transforms the input RGB space to a representation after the input layer, which is very close to YUV space of a color image. Then, we discuss about the biological consideration and relevance of our framework for its possible anatomical substratum of visual attention. We find a good correlation between components of HVS and the base operations of the proposed technique. Hence, we say that this generic explainable framework provides a new perspective to see relationship between classical methods/human visual system and DNN based ones.

Full Text