Mutual Information Regularized Feature-Level Frankenstein for Discriminative Recognition.

Xiaofeng Liu,Yang Chao,Bhagavatula Vijayakumar,Jane J You,C.-C Jay Kuo

doi:10.1109/tpami.2021.3077397

Abstract

Deep learning recognition approaches can potentially perform better if we can extract a discriminative representation that controllably separates nuisance factors. In this paper, we propose a novel approach to explicitly enforce the extracted discriminative representation d, extracted latent variation l (e,g., background, unlabeled nuisance attributes), and semantic variation label vector s (e.g., labeled expressions/pose) to be independent and complementary to each other. We can cast this problem as an adversarial game in the latent space of an auto-encoder. Specifically, with the to-be-disentangled s, we propose to equip an end-to-end conditional adversarial network with the ability to decompose an input sample into d and l. However, we argue that maximizing the cross-entropy loss of semantic variation prediction from d is not sufficient to remove the impact of s from d, and that the uniform-target and entropy regularization are necessary. A collaborative mutual information regularization framework is further proposed to avoid unstable adversarial training. It is able to minimize the differentiable mutual information between the variables to enforce independence. The proposed discriminative representation inherits the desired tolerance property guided by prior knowledge of the task. Our proposed framework achieves top performance on diverse recognition tasks, including digits classification, large-scale face recognition on LFW and IJB-A datasets, and face recognition tolerant to changes in lighting, makeup, disguise, etc.

Full Text