Abstract

Deep neural networks often suffer performance degradation when the testing data distribution differs significantly from the training data distribution. To address this problem, most domain generalization (DG) approaches focus on learning domain-invariant features from multiple source domains. However, enforcing model invariance among different domains can also result in information that is not shared across training domains being discarded, even though it may still be relevant to unseen domains. We argue that a generalized model should be discriminative towards comprehensive features across different domains. To avoid unexpected information loss, we propose a two-stage learning scheme, named Domain-aware Knowledge Distillation (DAKD). In the first stage, we pre-train a parameter-efficient multi-expert model, where each expert is trained to be responsible for a specific source domain and preserve the domain-specific features. In the second stage, following the traditional knowledge distillation formulation, we train a new student network assisted by the pre-trained experts, where the prediction of the student for a certain domain is regularized by the soft output of the corresponding expert. A comprehensive ablation study and analysis highlight that our distilled model can preserve more source-domain-specific features, and obtain higher accuracy on unseen domains compared to its counterpart without distillation. We discuss the relationship between the generalization risk bound theory and our method. The upper bound is reduced by mitigating source-target domain discrepancy and reducing risk on source domains. Experiments demonstrate that our method provides state-of-the-art performance on PACS and other more challenging datasets such as Office-Home and DomainNet. The source code is available at https://github.com/ZZQ321/DAKD.git

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call