The capability to disentangle underlying factors hidden in the observable data, thereby obtaining their abstract representations, is considered one important ingredient for the subsequent success of deep networks in various application scenarios. Recently, numerous practical measures and learning strategies have been established for disentanglement, showcasing their potential in improving the model’s explainability, controlability, and robustness. However, when the downstream tasks come to the classification issues, there is still no consensus in the community on the definition or measurement for disentanglement, and its connection to the generalization capacity remains not very clear. Aiming at this, we explore the highly non-linear effect of a specified hidden layer on the generalization capacity from an information perspective and obtain a tight bound. Upon decompsing the bound, we find that besides the unsupervised disentanglement measure term in the conventional sense, a new supervised disentanglement term also emerges with a nonnegligible effect on the generality. Consequently, a novel label-based disentanglement measure (LDM) is naturally introduced as the discrepancy between these two terms under the supervised learning settings to substitute the commonly used unsupervised disentanglement measure. The theoretical analysis reveals an inverse relationship between the defined LDM and the generalization capacity. Finally, using LDM as regularizer, the experiments show that the deep neural networks (DNNs) can effectively reduce generalization error while improving classification accuracy when noise is added to the data features or labels, which strongly supports our points.
Read full abstract