Disentanglement learning aims to separate explanatory factors of variation so that different attributes of the data can be well characterized and isolated, which promotes efficient inference for downstream tasks. Mainstream disentanglement approaches based on generative adversarial networks (GANs) learn interpretable data representation. However, most typical GAN-based works lack the discussion of the latent subspace, causing insufficient consideration of the variation of independent factors. Although some recent research analyzes the latent space on pretrained GANs for image editing, they do not emphasize learning representation directly from the subspace perspective. Appropriate subspace properties could facilitate corresponding feature representation learning to satisfy the independent variation requirements of the obtained explanatory factors, which is crucial for better disentanglement. In this work, we propose a unified framework for ensuring disentanglement, which fully investigates latent subspace learning (SL) in GAN. The novel GAN-based architecture explores orthogonal subspace representation (OSR) on vanilla GAN, named OSRGAN. To guide a subspace with strong correlation, less redundancy, and robust distinguishability, our OSR includes three stages, self-latent-aware, orthogonal subspace-aware, and structure representation-aware, respectively. First, the self-latent-aware stage promotes the latent subspace strongly correlated with the data space to discover interpretable factors, but with poor independence of variation. Second, the following orthogonal subspace-aware stage adaptively learns some 1-D linear subspace spanned by a set of orthogonal bases in the latent space. There is less redundancy between them, expressing the corresponding independence. Third, the structure representation-aware stage aligns the projection on the orthogonal subspace and the latent variables. Accordingly, feature representation in each linear subspace can be distinguishable, enhancing the independent expression of interpretable factors. In addition, we design an alternating optimization step, achieving a tradeoff training of OSRGAN on different properties. Despite it strictly constrains orthogonality, the loss weight coefficient of distinguishability induced by orthogonality could be adjusted and balanced with correlation constraint. To elucidate, this tradeoff training prevents our OSRGAN from overemphasizing any property and damaging the expressiveness of the feature representation. It takes into account both interpretable factors and their independent variation characteristics. Meanwhile, alternating optimization could keep the cost and efficiency of forward inference unchanged and will not burden the computational complexity. In theory, we clarify the significance of OSR, which brings better independence of factors, along with interpretability as correlation could converge to a high range faster. Moreover, through the convergence behavior analysis, including the objective functions under different constraints and the evaluation curve with iterations, our model demonstrates enhanced stability and definitely converges toward a higher peak for disentanglement. To depict the performance in downstream tasks, we compared the state-of-the-art GAN-based and even VAE-based approaches on different datasets. Our OSRGAN achieves higher disentanglement scores on FactorVAE, SAP, MIG, and VP metrics. All the experimental results illustrate that our novel GAN-based framework has considerable advantages on disentanglement.
Read full abstract