Abstract

Despite the recent success achieved by deep neural networks (DNNs), it remains challenging to disclose/explain the decision-making process from the numerous parameters and complex non-linear functions. To address the problem, explainable AI (XAI) aims to provide explanations corresponding to the learning and prediction processes for deep learning models. In this paper, we propose a novel representation learning framework of Describe, Spot and eXplain (DSX). Based on the architecture of Transformer, our proposed DSX framework is composed of two learning stages, descriptive prototype learning and discriminative prototype discovery. Given an input image, the former stage is designed to derive a set of descriptive representations, while the latter stage further identifies a discriminative subset, offering semantic interpretability for the corresponding classification tasks. While our DSX does not require any ground truth attribute supervision during training, the derived visual representations can be practically associated with physical attributes provided by domain experts. Extensive experiments on fine-grained classification and person re-identification tasks qualitatively and quantitatively verify the use our DSX model for offering semantically practical interpretability with satisfactory recognition performances.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.