Distributed multi-task learning (MTL) can jointly learn multiple models and achieve better generalization performance by exploiting relevant information between the tasks. However, distributed MTL suffers from communication bottlenecks, in particular for large-scale learning with a massive number of tasks. This paper considers distributed MTL systems where distributed workers wish to learn different models orchestrated by a central server. To mitigate communication bottlenecks both in the uplink and downlink, we propose coded computing schemes for flexible and fixed data placements, respectively. Our schemes can significantly reduce communication loads by exploiting workers’ local information and creating multicast opportunities for both the server and workers. Moreover, we establish information-theoretic lower bounds on the optimal downlink and uplink communication loads, and prove the approximate optimality of the proposed schemes. For flexible data placement, our scheme achieves the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">optimal</i> downlink communication load, and the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">order optimal</i> uplink communication load that is smaller than 2 times of the information-theoretic optimum. For fixed data placement, the gaps between our communication load and the optimum are within the minimum computation load among all workers, regardless of the number of workers. Experiments demonstrate that our schemes can significantly speed up the training process compared to the traditional approach.
Read full abstract