Abstract
Deep learning explainability is often reached by gradient-based approaches that attribute the network output to perturbations of the input pixels. However, the relevance of input pixels may be difficult to relate to relevant image features in some applications, e.g. diagnostic measures in medical imaging. The framework described in this paper shifts the attribution focus from pixel values to user-defined concepts. By checking if certain diagnostic measures are present in the learned representations, experts can explain and entrust the network output. Being post-hoc, our method does not alter the network training and can be easily plugged into the latest state-of-the-art convolutional networks. This paper presents the main components of the framework for attribution to concepts, in addition to the introduction of a spatial pooling operation on top of the feature maps to obtain a solid interpretability analysis. Furthermore, regularized regression is analyzed as a solution to the regression overfitting in high-dimensionality latent spaces. The versatility of the proposed approach is shown by experiments on two medical applications, namely histopathology and retinopathy, and on one non-medical task, the task of handwritten digit classification. The obtained explanations are in line with clinicians’ guidelines and complementary to widely used visualization tools such as saliency maps.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.