Privacy and security concerns restrict access to original training datasets, posing significant challenges for model compression. Data-Free Knowledge Distillation (DFKD) emerges as a solution, aiming to transfer knowledge from teacher to student networks without accessing original data. Existing DFKD methods struggle to generate high-quality synthetic samples that capture the complexities of real-world data, leading to suboptimal knowledge transfer. Moreover, these approaches often fail to preserve the spatial attributes of the teacher network, resulting in shortcut learning and limited generalization.To address these issues, a novel DFKD strategy is proposed with three innovations: (1) an enhanced DCGAN generator with an attention module for synthesizing samples with improved micro-discriminative features; (2) a Multi-Scale Spatial Activation Region Consistency (MSARC) mechanism to accurately replicate the teacher's spatial attributes; and (3) an adversarial learning framework that creates a dynamic competitive environment between the generative and distillation phases. Rigorous evaluation of the method on several benchmark datasets, including CIFAR-10, CIFAR-100, Tiny-ImageNet, and medical imaging datasets such as PathMNIST, BloodMNIST, and PneumoniaMNIST, demonstrates superior performance compared to existing DFKD methods. Specifically, on CIFAR-100, the student network attains an accuracy of 77.85%, surpassing previous methods like CMI and SpaceshipNet. On BloodMNIST, the method achieves an accuracy of 80.50%, outperforming the next best method by over 5%.
Read full abstract