Cost Efficient Batch Processing in Amazon Cloud with Deadline Awareness

Kabin Tamrakar,Anis Yazidi,Harek Haugerud

doi:10.1109/aina.2017.170

Abstract

Amazon spot instances have become a very popular alternative for cost-saving in the cloud. The spot instances are prone to abrupt termination whenever the spot market price exceeds the bid price. In this paper, spot instances are resorted to in task instances' group of Amazon Elastic Map Reduce (EMR) cluster to process batch jobs with deadline. Amazon EMR makes it convenient to process Big Data with the aid of the Hadoop framework. However, the processed intermediate results in the task nodes of the cluster are lost if the spot instances gets terminated which can cause processing delay. The cost efficiency can be realized by exploiting the non-real time nature of batch computing for Big Data. Two algorithms are devised for achieving cost efficient processing in Hadoop MapReduce. Both algorithms process data in divisions such that abrupt termination of spot instances only affects the last division. Based on monitoring the progress at given checkpoints, task group's capacity is resized to complete the processing within the deadline. Progress is measured in terms of the number of completed work divisions. The first algorithm begins with some spot instances whose number is initially estimated. To complete processing of all data in time, on-demand instances are deployed after a certain threshold time. The second algorithm starts by using higher number of spot instances than required to complete the work within the given deadline. Therefore, it has higher chance to rely solely on instances during the whole execution of the batch job. On-demand instances are deployed only in case of slow progress caused by termination of the spot instances combined with subsequent unsuccessful bids. The experiments show that both algorithms are able to minimize the processing cost. The second algorithm further minimizes the cost in most cases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cost Efficient Batch Processing in Amazon Cloud with Deadline Awareness

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Performance and Reliability Effects of Multi-tier Bidding on MapReduce in Auction-Based Clouds
M Taifi ... J Y Shi
-
M Taifi, et. al.M Taifi ... J Y Shi
01 Mar 2013
01 Mar 2013

How Small and Medium Enterprises (SMEs) Should Bid for Spot Instances of Amazon's EC2 Cloud
Debashis Saha
International Journal of Business Data Communications and Networking | VOL. 10
Debashis SahaDebashis Saha
01 Oct 2014
International Journal of Business Data Communications and Networking | VOL. 10

Large-scale Image Processing using Amazon EC2 Spot Instances
Youngsol Koh ... Yung-Hsiang Lu
Electronic Imaging | VOL. 28
Youngsol Koh, et. al.Youngsol Koh ... Yung-Hsiang Lu
14 Feb 2016
Electronic Imaging | VOL. 28

Blending on-demand and spot instances to lower costs for in-memory storage
Zichen Xu ... Nan Deng
-
Zichen Xu, et. al.Zichen Xu ... Nan Deng
01 Apr 2016
01 Apr 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cost Efficient Batch Processing in Amazon Cloud with Deadline Awareness

Abstract

Talk to us

Similar Papers