Abstract
Zero-shot learning (ZSL) has attracted significant attention due to its capabilities of classifying new images from unseen classes. To perform the classification task for ZSL, learning visual and semantic embeddings has been the main research approach in existing literature. At the same time, generating complementary explanations to justify the classification decision has remained largely unexplored. In this paper, we propose to address a new and challenging task, namely explainable zero-shot learning (XZSL), which aims to generate visual and textual explanations to support the classification decision. To accomplish this task, we build a novel Deep Multi-modal Explanation (DME) model that incorporates a joint visual-attribute embedding module and a multi-channel explanation module in an end-to-end fashion. In contrast to existing ZSL approaches, our visual-attribute embedding is associated not only with the decision, but also with new visual and textual explanations. For visual explanations, we first capture several attribute activation maps (AAM) and then merge them into a class activation map (CAM) that visually infers which region of an image is relevant to the class. Textual explanations are generated from the multi-channel explanation module, jointly integrating three long short-term memory models (LSTMs) each of which is conditioned on a different feature representation. Additionally, we suggest that the DME model can retain explanatory consistency for similar instances and explanatory diversity for diverse instances. We conduct qualitative and quantitative experiments to assess the model for ZSL classification and explanation. Specifically, the ablation studies verify the effectiveness of the components in our model. Our results on three well-known datasets are competitive with prior approaches. More importantly, the joint training of our embedding and explanation modules demonstrates mutual performance improvements between ZSL classification and explanation. We shed more light on DME to analyze and diagnose its advantages and limitations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.