Abstract

Model performance can be further improved with the extra guidance apart from the one-hot ground truth. To achieve it, recently proposed recollection-based methods utilize the valuable information contained in the past training history and derive a "recollection" from it to provide data-driven prior to guide the training. In this article, we focus on two fundamental aspects of this method, i.e., recollection construction and recollection utilization. Specifically, to meet the various demands of models with different capacities and at different training periods, we propose to construct a set of recollections with diverse distributions from the same training history. After that, all the recollections collaborate together to provide guidance, which is adaptive to different model capacities, as well as different training periods, according to our similarity-based elastic knowledge distillation (KD) algorithm. Without any external prior to guide the training, our method achieves a significant performance gain and outperforms the methods of the same category, even as well as KD with well-trained teacher. Extensive experiments and further analysis are conducted to demonstrate the effectiveness of our method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.