Inverse-Based Approach to Explaining and Visualizing Convolutional Neural Networks.

Hyuk Jin Kwon,Hyung Il Koo,Nam Ik Cho,Jae Woong Soh

doi:10.1109/tnnls.2021.3084757

Abstract

This article presents a new method for understanding and visualizing convolutional neural networks (CNNs). Most existing approaches to this problem focus on a global score and evaluate the pixelwise contribution of inputs to the score. The analysis of CNNs for multilabeled outputs or regression has not yet been considered in the literature, despite their success on image classification tasks with well-defined global scores. To address this problem, we propose a new inverse-based approach that computes the inverse of a feedforward pass to identify activations of interest in lower layers. We developed a layerwise inverse procedure based on two observations: 1) inverse results should have consistent internal activations to the original forward pass and 2) a small amount of activation in inverse results is desirable for human interpretability. Experimental results show that the proposed method allows us to analyze CNNs for classification and regression in the same framework. We demonstrated that our method successfully finds attributions in the inputs for image classification with comparable performance to state-of-the-art methods. To visualize the tradeoff between various methods, we developed a novel plot that shows the tradeoff between the amount of activations and the rate of class reidentification. In the case of regression, our method showed that conventional CNNs for single image super-resolution overlook a portion of frequency bands that may result in performance degradation.

Full Text