FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models

Yunzhuo Liu,Chenghu Zhou,Wenhao Ma,Tian Guo,Bo Jiang,Xinbing Wang,Zimeng Huang

doi:10.1145/3578338.3593543

Yunzhuo Liu, Chenghu Zhou + Show 5 more

Open Access

PDF Available

https://doi.org/10.1145/3578338.3593543

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Training deep learning (DL) models in the cloud has become a norm. With the emergence of serverless computing and its benefits of true pay-as-you-go pricing and scalability, systems researchers have recently started to provide support for serverless-based training. However, the ability to train DL models on serverless platforms is hindered by the resource limitations of today's serverless infrastructure and DL models' explosive requirement for memory and bandwidth. This paper describes FUNCPIPE, a novel pipelined training framework specifically designed for serverless platforms that enable fast and low-cost training of DL models. FUNCPIPE is designed with the key insight that model partitioning can be leveraged to bridge both memory and bandwidth gaps between the capacity of serverless functions and the requirement of DL training. Conceptually simple, we have to answer several design questions, including how to partition the model, configure each serverless function, and exploit each function's uplink/downlink bandwidth. We implement FUNCPIPE on two popular cloud serverless platforms and show that it achieves 7%-77% cost savings and 1.3X-2.2X speedup compared to state-of-the-art serverless-based frameworks.

Full Text