SpIitServe

Aman Jain,Bhuvan Urgaonkar,Ata F Baarzi,Mahmut Kandemir,George Kesidis,Nader Alfares

doi:10.1145/3357223.3366027

Abstract

Amazon Web Services (AWS) Lambdas and other functions (CFs) offer much lower startup latencies than virtual machines (VMs) (tens/hundreds of milliseconds vs. a few/several minutes) with lower minimum cost. This makes it appealing to use them for handling unexpected spikes in simple, stateless workloads [2, 3, 5]. If the spike persists, additional VMs may be launched and CFs can be decommissioned when the VMs are ready (VMs are cheaper per unit resource procured than CFs). However, it is not immediately clear if using CFs for complex workloads - those involving significant state exchange among components - is similarly effective. Current CFs have several restrictions that may limit their efficacy: (i) relatively limited resource capacity, especially main memory (e.g., an AWS Lambda may only have up to 3GB memory), (ii) limited lifetime (e.g., Lambdas are terminated after 15 minutes), and (iii) limited support for sharing of intermediate state (e.g., Lambdas must employ an external storage system such as AWS S3). Contrary to conventional wisdom, we show that it is possible to exploit the faster startup times of CFs to improve cost and performance of autoscaling even for complex workloads. Approach: We design SplitServe [1], implemented as an enhancement of Apache Spark [4], that is capable of simultaneously using AWS VMs and Lambdas for serving the tasks comprising a parallel Spark job. The most salient challenges addressed and design choices made in our efforts are: (i) State exchange: Instead of relying on a slower external cloud storage to transfer state, we leverage the resources associated with the procured VMs and employ HDFS for state exchange. We find that this allows both VMs and Lambdas to achieve throughputs close to that of local disks. Since we are using already provisioned disk capacity, we do not pay extra (as we would if we were to use, say, AWS S3). (ii) Segueing from Lambdas to newly available VMs: Simply killing ongoing tasks on Lambdas and rerunning them on newly available VMs triggers Spark's high overhead fault tolerance mechanisms. So, a diaphanous scheduling decision, based on the amount of time a Lambda function has been running, is made at per task granularity. Briefly, as the time since a Lambda was launched approaches the common-case startup delay for a VM, new tasks are not sent to the Lambda. Findings: In our experiments, we find that SplitServe reduces overall job execution time compared to the state of the art with either a homogeneous or heterogeneous execution environment, i.e., either all VMs or all Lambdas, or simultaneously involving both VMs and Lambdas to execute a job's tasks. For the heterogeneous case, our experimental evaluation of SplitServe using four different workloads (interactive TCP-DS, K-means clustering, PageRank, and Pi) shows that SplitServe-Spark improves performance up to 55% for workloads with small to modest amount of shuffling, and up to 31% in workloads with large amounts of shuffling, when compared to only VM based autoscaling. Also, with its novel segueing technique, SplitServe can help reduce costs by up to 21% while still providing almost 40% reduction in execution time. Ongoing Work: We are designing a comprehensive autoscaling system that leverages SplitServe's capabilities. We will carry out an empirical evaluation of the performance/cost improvements such a system can offer over state of the art solutions with diverse workloads that exhibit realistic dynamism and uncertainty.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SpIitServe

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Remote Memory Swapping for Virtual Machines in Commercial Infrastructure-as-a-Service
Kashifuddin Qazi ... Steven Romero
-
Kashifuddin Qazi, et. al.Kashifuddin Qazi ... Steven Romero
01 Oct 2019
01 Oct 2019

Built Custom Alexa Skills using AWS
Preksha Pratap
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 08
Preksha PratapPreksha Pratap
08 Feb 2024
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 08

Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline
Joseph Slagel ... Robert L Moritz
Molecular & Cellular Proteomics | VOL. 14
Joseph Slagel, et. al.Joseph Slagel ... Robert L Moritz
01 Feb 2015
Molecular & Cellular Proteomics | VOL. 14

Performance Analysis of Various Server Hosting Techniques
Prerna Jain ... Yogesh Munjal
Procedia Computer Science | VOL. 173
Prerna Jain, et. al.Prerna Jain ... Yogesh Munjal
01 Jan 2020
Procedia Computer Science | VOL. 173

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SpIitServe

Abstract

Talk to us

Similar Papers