Abstract

Knowledge distillation is a model compression technology, which can effectively improve the performance of a small student network by learning knowledge from a large pre-trained teacher network. In most previous works of feature distillation, the performance of the student is still lower than the teacher network due to it only being supervised by the teacher’s features and labels. In this paper, we novelly propose Cross-layer Fusion for Knowledge Distillation named CFKD. Specifically, instead of only using the features of the teacher network, we aggregate the features of the teacher network and student network together by a dynamic feature fusion strategy (DFFS) and a fusion module. The fused features are informative, which not only contain expressive knowledge of teacher network but also have the useful knowledge learned by previous student network. Therefore, the student network learning from the fused features can achieve comparable performance with the teacher network. Our experiments demonstrate that the performance of the student network can be trained by our method, which can be closer to the teacher network or even better.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call