Abstract

In this paper, we propose a novel mechanism for alleviating the negative transfer in multi-task learning. Multi-task learning aims to learn the general meta-knowledge via sharing inductive bias for improving generalization ability. However, there exists a negative transfer problem in MTL, in which the performance improvement of a specific task leads to performance degradation of other tasks. Multi-task learning is essentially a multi-objective problem because of the task competition, necessitating a trade-off in individual optimality. Inspired by knowledge distillation, a set of task-specific teacher models can transfer the knowledge to a student multi-task model, without signicifantly loss in performance. In other words, the individual optimality can be achieved via multi-teacher knowledge distillation, where each teacher model is optimal and contains sufficient general meta-knowledge. Therefore, we propose multi-task distillation to couple with the negative transfer, turning the multi-objective optimization problem into a multi-teacher knowledge distillation problem. In detail, we firstly collect the optimal task-specific teacher models, and then achieve the individual optimality of the student model by knowledge distillation. Extensive experiment results on different benchmark datasets demonstrate the effectiveness of our method over current state-of-the-art multi-task learning formulations and single-task training.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call