Using Mixture of Experts to accelerate dataset distillation

Zhi Xu,Zhenyong Fu

doi:10.1016/j.jvcir.2024.104137

Abstract

Recently, large datasets have become increasingly necessary for most deep learning tasks, however, large datasets may bring some problems, such as disk storage and huge computational expense. Dataset distillation is an emerging field that aims to synthesize a small dataset from the original dataset, then a random model trained on the distillation dataset can achieve comparable performances to the same architecture model trained on the original dataset. Matching Training Trajectories (MTT) achieves a leading performance in this field, but it needs to pre-train 200 expert models before the formal distillation process, which is called buffer process. In this paper, we propose a new method to reduce the consumed time of buffer process. Concretely, we use Mixture of Experts (MoE) to train several expert models parallelly in buffer process. The experiments show our method can achieve a speedup of up to approximately 4∼8× in buffer process with getting comparable distillation performances.

Full Text