Abstract

Matrix multiplication is a fundamental operation in various machine learning algorithms. With the size of the dataset increasing rapidly, it is now a common practice to compute the large-scale matrix multiplication on multiple servers, with each server running a task that multiplies submatrices of input matrices. As straggling servers are inevitable in a distributed infrastructure, various coding schemes, which deploy coded tasks encoded from input matrices, have been proposed. The overall result can then be decoded from a subset of such coded tasks. However, as resources are shared with other jobs in a distributed infrastructure and their performance can change dynamically, the optimal way to encode the input matrices may also change with time. So far, existing coding schemes for the matrix multiplication all require splitting the input matrices and encoding them in advance, and cannot change the coding schemes or adjust their parameters after encoding. In this paper, we propose a framework that can change the coding schemes and their parameters, by only locally re-encoding each task on each server. We demonstrate that the original tasks can be re-encoded into new tasks only incurring marginal overhead.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call