Abstract

Serverless computing allows Cloud users to deploy and run applications without managing physical or virtual hardware. Since serverless computing can scale easily via function replication, a growing trend is to use serverless computing to run large, distributed workloads without needing to provision clusters of physical or virtual machines. Recent work has successfully deployed serverless applications of data analytics, machine learning, linear algebra, and video processing, among others. Many of these workloads are embarrassingly parallel and follow the stateless function execution paradigm for which serverless computing is designed. However, some applications, particularly those implementing data pipelines, necessitate state sharing between different data processing stages. These workloads have a high degree of parallelism and can also scale easily with the number of concurrent functions but use slow Cloud storage solutions to communicate data between functions. Current serverless application deployments use containers or lightweight virtual machines with limited memory, computation power, and execution time. Therefore, a direct communication path between functions would need to be ephemeral and function under constrained resources. Introducing an ephemeral communication path between functions raises a number of additional challenges. Serverless providers use network firewalls to block inbound connections. Furthermore, the performance and scaling characteristics of a direct communication path would be entirely opaque to users. This chapter presents an ephemeral communication framework for serverless environments that uses direct network connections between functions. The framework has been successfully deployed on actual, production-strength serverless computing offerings, specifically AWS. The insight behind the proposed framework is that current serverless computing environments use a common networking configuration called Network Address Translation (NAT) to allow outbound connections from functions while blocking inbound connections. This work presents the design and implementation of an ephemeral communication library for AWS Lambda. The library includes function and server components so that serverless applications can use network communications easily. It specifies an interface for serverless application code that runs on each function. The communication library supports multi-function jobs and manages communication between functions automatically. This work also implements an orchestrator server to invoke functions and send control messages between them. An external server is necessary to perform NAT traversal, and is also used for coordination. By using network connections, the proposed library achieves high performance and excellent scaling in workloads with over 100 functions. This work measures throughput of 680 Mbps between a pair of functions and verifies that this is the maximum throughput achievable on the current AWS Lambda offering. It also evaluates the framework using a multi-stage reduce-by-key application. Compared to an equivalent implementation using object storage, the library is 4.7 times faster and costs only 52% as much.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call