A Deep Multi-modal Explanation Model for Zero-shot Learning.

Yu Liu,Tinne Tuytelaars

doi:10.1109/tip.2020.2975980

Abstract

Zero-shot learning (ZSL) has attracted significant attention due to its capabilities of classifying new images from unseen classes. To perform the classification task for ZSL, learning visual and semantic embeddings has been the main research approach in existing literature. At the same time, generating complementary explanations to justify the classification decision has remained largely unexplored. In this paper, we propose to address a new and challenging task, namely explainable zero-shot learning (XZSL), which aims to generate visual and textual explanations to support the classification decision. To accomplish this task, we build a novel Deep Multi-modal Explanation (DME) model that incorporates a joint visual-attribute embedding module and a multi-channel explanation module in an end-to-end fashion. In contrast to existing ZSL approaches, our visual-attribute embedding is associated not only with the decision, but also with new visual and textual explanations. For visual explanations, we first capture several attribute activation maps (AAM) and then merge them into a class activation map (CAM) that visually infers which region of an image is relevant to the class. Textual explanations are generated from the multi-channel explanation module, jointly integrating three long short-term memory models (LSTMs) each of which is conditioned on a different feature representation. Additionally, we suggest that the DME model can retain explanatory consistency for similar instances and explanatory diversity for diverse instances. We conduct qualitative and quantitative experiments to assess the model for ZSL classification and explanation. Specifically, the ablation studies verify the effectiveness of the components in our model. Our results on three well-known datasets are competitive with prior approaches. More importantly, the joint training of our embedding and explanation modules demonstrates mutual performance improvements between ZSL classification and explanation. We shed more light on DME to analyze and diagnose its advantages and limitations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Deep Multi-modal Explanation Model for Zero-shot Learning.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing

Lead the way for us

Journal: IEEE Transactions on Image Processing	Publication Date: Jan 1, 2020
Citations: 83

Similar Papers

Evaluation of data preprocessing and feature selection process for prediction of hourly PM10 concentration using long short-term memory models
İpek Aksangür ... Caner Erden
Environmental Pollution | VOL. 311
İpek Aksangür, et. al.İpek Aksangür ... Caner Erden
17 Aug 2022
Environmental Pollution | VOL. 311

A data-driven strategy using long short term memory models and reinforcement learning to predict building electricity consumption
Xinlei Zhou ... Zhenjun Ma
Applied Energy | VOL. 306
Xinlei Zhou, et. al.Xinlei Zhou ... Zhenjun Ma
02 Nov 2021
Applied Energy | VOL. 306

Prediction of Consumer Price Index based on Long Short-Term Memory Model
Xiqin Ao ... Yujie Gong
Journal of Physics: Conference Series | VOL. 1550
Xiqin Ao, et. al.Xiqin Ao ... Yujie Gong
01 May 2020
Journal of Physics: Conference Series | VOL. 1550

Weather Forecasting Using Merged Long Short-Term Memory Model (LSTM) and Autoregressive Integrated Moving Average (ARIMA) Model
Afan Galih Salman ... Yaya Heryadi
Journal of Computer Science | VOL. 14
Afan Galih Salman, et. al.Afan Galih Salman ... Yaya Heryadi
01 Jul 2018
Journal of Computer Science | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Deep Multi-modal Explanation Model for Zero-shot Learning.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing