Distributed training for accelerating metalearning algorithms

Shruti Kunde,Rekha Singhal,Mayank Mishra,Amey Pandit

doi:10.1145/3460866.3461773

Abstract

The lack of large amounts of training data diminishes the power of deep learning to train models with a high accuracy. Few shot learning (i.e. learning using few data samples) is implemented by Meta-learning, a learn to learn approach. Most gradient based metalearning approaches are hierarchical in nature and computationally expensive. Metalearning approaches generalize well across new tasks by training on very few tasks; but require multiple training iterations which lead to large training times. In this paper, we propose a generic approach to accelerate the training process of metalearning algorithms by leveraging a distributed training setup. Training is conducted on multiple worker nodes, with each node processing a subset of the tasks, in a distributed training paradigm. We propose QMAML (Quick MAML), which is a distributed variant of the MAML (Model Agnostic Metalearning) algorithm, to illustrate the efficacy of our approach. MAML is one of the most popular meta-learning algorithms that estimates initialization parameters for a meta model to be used by similar newer tasks for faster adaptation. However, MAML being hierarchical in nature is computationally expensive. The learning-tasks in QMAML are run on multiple workers in order to accelerate the training process. Similar to the distributed training paradigm, gradients for learning-tasks are consolidated to update the meta-model. We leverage a lightweight distributed training library, Horovod, to implement QMAML. Our experiments illustrate that QMAML reduces the training time of MAML by 50% over an open source library, learn2learn, for image recognition tasks, which are quasi-benchmark tasks in the field of metalearning.

Full Text