Learning Flow-Based Disentanglement.

Jen-Tzung Chien,Sheng-Jhe Huang

doi:10.1109/tnnls.2022.3190068

Jen-Tzung Chien, Sheng-Jhe Huang

Open Access

https://doi.org/10.1109/tnnls.2022.3190068

Copy DOI

Abstract

Face reenactment aims to generate the talking face images of a target person given by a face image of source person. It is crucial to learn latent disentanglement to tackle such a challenging task through domain mapping between source and target images. The attributes or talking features due to domains or conditions become adjustable to generate target images from source images. This article presents an information-theoretic attribute factorization (AF) where the mixed features are disentangled for flow-based face reenactment. The latent variables with flow model are factorized into the attribute-relevant and attribute-irrelevant components without the need of the paired face images. In particular, the domain knowledge is learned to provide the condition to identify the talking attributes from real face images. The AF is guided in accordance with multiple losses for source structure, target structure, random-pair reconstruction, and sequential classification. The random-pair reconstruction loss is calculated by means of exchanging the attribute-relevant components within a sequence of face images. In addition, a new mutual information flow is constructed for disentanglement toward domain mapping, condition irrelevance, and condition relevance. The disentangled features are learned and controlled to generate image sequence with meaningful interpretation. Experiments on mouth reenactment illustrate the merit of individual and hybrid models for conditional generation and mapping based on the informative AF.

Full Text