Scalability of knowledge distillation in incremental deep learning for fast object detection

Elizabeth Irenne Yuwono,Dian Tjondonegoro,Golam Sorwar,Alireza Alaei

doi:10.1016/j.asoc.2022.109608

Elizabeth Irenne Yuwono, Dian Tjondonegoro + Show 2 more

https://doi.org/10.1016/j.asoc.2022.109608

Copy DOI

Abstract

Visual recognition requires incremental learning to scale its underlying deep learning models with continuous data growth. The existing scalability challenge is maintaining the balance between effectiveness (accuracy) and efficiency (computational requirements) due to the rapidly increasing storage demand, computational time, and memory usage for processing both the old and new data. This paper aims to investigate the scalability of the incremental deep learning approach for visual recognition, specifically for fast object detection applications. The experimental study demonstrates knowledge retention and computational expense of training-at-once, compared to incremental learning with knowledge transfer and distillation. The experiment was based on a state-of-the-art object detector, which was extended to incorporate knowledge transfer and distillation to benchmark three training approaches, namely, training-at-once, transfer learning without, and with distillation. The experimental results and analysis examined the pros and cons of each training approach while adjusting some key parameters, focusing on comparing the accuracy of new classes, knowledge retention of old classes, data storage, computation time, and memory usage. Training-at-once (the baseline) yielded the highest accuracy of both new and old classes, at the expense of the largest storage and memory usage. Compared to the baseline, both transfer learning approaches saved the storage requirement by −73% but with an increased computation time of ＋53%. Transfer learning with distillation was important for knowledge retention, maintaining 96% accuracy with old classes, which indicates its ability to handle long-term incremental learning. Compared to using distillation, transfer learning without distillation was able to achieve slightly better accuracy with new classes (−53% compared to −60%), less memory usage (−65% compared to ＋26%), but at the expense of forgetting old classes (−100%). This study confirmed that distillation loss could help balance the accuracy of old and new object classes while maintaining all the benefits of incremental learning. The experiments using varied key parameters across all training approaches confirmed that training batch size and number of assigned classes play an important role in maintaining the accuracy of new classes, retaining the knowledge of old classes, and reducing the computational cost.

Full Text