Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning

Zihan Ye,Fuyuan Hu,Fan Lyu,Linyan Li,Kaizhu Huang

doi:10.1109/tmm.2021.3089017

Abstract

Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled representations due to the unavailability of unseen classes in ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet loss (MMTL) which utilizes multimodal information to search a disentangled representation space. As such, all classes can interplay which can benefit learning disentangled class representations in the searched space. Furthermore, we develop a novel model called Disentangling Class Representation Generative Adversarial Network (DCR-GAN) focusing on exploiting the disentangled representations in training, feature synthesis, and final recognition stages. Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features. Extensive experiments show that our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets. Our code is available at https://github.com/FouriYe/DCRGAN-TMM.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Jan 1, 2022
Citations: 17

Similar Papers

Deep Metric Learning with Online Hard and Soft Selection for Person Re-identification
Mingyang Yu ... Sei-Ichiro Kamata
-
Mingyang Yu, et. al.Mingyang Yu ... Sei-Ichiro Kamata
01 Jun 2018
01 Jun 2018

Disentangled representation for sequential treatment effect estimation
Jiebin Chu ... Zhengxing Huang
Computer Methods and Programs in Biomedicine | VOL. 226
Jiebin Chu, et. al.Jiebin Chu ... Zhengxing Huang
05 Oct 2022
Computer Methods and Programs in Biomedicine | VOL. 226

Cross-Modality Person ReID with Maximum Intra-class Triplet Loss
Xiaojiang Hu ... Yue Zhou
-
Xiaojiang Hu, et. al.Xiaojiang Hu ... Yue Zhou
01 Jan 2020
01 Jan 2020

Multi-threshold deep metric learning for facial expression recognition
Wenwu Yang ... Jianbing Shen
Pattern Recognition | VOL. 156
Wenwu Yang, et. al.Wenwu Yang ... Jianbing Shen
01 Jul 2024
Pattern Recognition | VOL. 156

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia