Abstract

Model fusion can effectively improve the effect of model prediction, but it will bring about an increase in time. In this paper, the dual-stage progressive knowledge distillation is improved in combination with multi-teacher knowledge distillation technology. A simple and effective multi-teacher's Softtarget integration method is proposed in multi-teacher network knowledge distillation. Improve the guiding role of excellent models in knowledge distillation. Dual-stage progressive knowledge distillation is a method for small sample knowledge distillation. A progressive network grafting method is used to realize knowledge distillation in a small sample environment. In the first step, the student blocks are grafted one by one onto the teacher network and intertwined with other teacher blocks for training, and the training process only updates the parameters of the grafted blocks. In the second step, the trained student blocks are grafted onto the teacher network in turn, so that the learned student blocks adapt to each other and finally replace the teacher network to obtain a lighter network structure. Using Softtarget acquired by this method in Dual-stage progressive knowledge distillation instead of Hardtarget training, excellent results were obtained on BreakHis data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call