An industry perspective for algorithm performance, testing, and validation

Brian Young

doi:10.1016/j.jelectrocard.2023.03.018

Abstract

It is well known that ensembles of different models can achieve better performance. Unfortunately, training multiple deep neural networks is time-consuming and expensive. Also, they require more space for storage and more computation for inference, making them unsuitable for applications with limited resources, such as mobile and embedded devices. To address these two fundamental problems in ensemble learning, we propose a novel method called Snapshot-Guided Adversarial Training (SGAT) to automatically accumulate and distill knowledge from the training, where knowledge from the earlier training iterations of the network is distilled, and transferred into its late training iterations via an adversarial learning strategy. To accumulate knowledge from the training process, we employ a cyclic annealing schedule and take a model snapshot at the end of each training interval. Furthermore, we use a shared discriminator to encourage the distillation process. The main advantages of our method are: (1) We can directly train a network with its free snapshots for knowledge distillation, instead of heavily depending on pre-trained models; (2) the inference cost remains the same as using a single model after finishing knowledge transferring; (3) SGAT is a general method and can be applied to the existing network architectures. Our extensive experiments show that SGAT consistently outperforms the standing training method with a clear margin. For example, with the same training budget, it achieves 2.86% more accuracy on average than the baseline when training MobileNet-V2 from scratch on CIFAR-100. Meanwhile, SGAT also achieves better performance in most cases than the existing ensemble method and knowledge distillation. For example, to train MobileNet-V2 from scratch on ImageNet, it gets 0.7% more accuracy than snapshot ensembles, and 0.93% more accuracy than snapshot distillation. More importantly, our accumulated learning strategy makes SGAT achieve much better performance when we increase training time. For example, compared with the standard training method for MobileNet-V2, it gets 3.13% more accuracy on ImageNet.

Full Text