Abstract

The huge computational requirements and memory footprint limit the practical deployment of super resolution (SR) models. Knowledge distillation (KD) allows student networks to obtain performance improvement by learning from over-parameterized teacher networks. Previous work has attempted to solve SR distillation problem by using feature-based distillation, which ignores the supervisory role of the teacher module itself. In this paper, we introduce a cross knowledge distillation framework to compress and accelerate SR models. Specifically, we propose to obtain supervision by cascading the student into the teacher network for directly utilizing teacher’s well-trained parameters. This not only reduces the difficulty of optimization for students but also avoids designing alignment with obscure feature textures between two networks. To the best of our knowledge, we are the first work to explore the cross distillation paradigm on the SR tasks. Experiments on typical SR networks have shown the superiority of our method in generated images, PSNR and SSIM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call