Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning

Salman Avestimehr,Songze Li

doi:10.1561/0100000103

Abstract

Recent years have witnessed a rapid growth of large-scale machine learning and big data analytics, facilitating the developments of data intensive applications like voice/image recognition, real-time mapping services, autonomous driving, social networks, and augmented/virtual reality. These applications are supported by cloud infrastructures composed of large datacenters. The large scale distributed machine learning/data analytics systems provide the necessary processing power to handle these applications, but suffer three major performance bottlenecks; namely, communication, straggler and security. In this ground-breaking monograph, the authors introduce the novel concept of Coded Computing. Coded Computing exploits coding theory to optimally inject and leverage data/task redundancy in distributed computing systems, creating coding opportunities to overcome the bottlenecks. After introducing the reader to the core of the problem, the authors describe in detail each of the bottlenecks that can be overcome using Coded Computing. The monograph provides an accessible introduction into how this new technique can be used in developing large-scale computing systems.

Full Text