Abstract

Background: The MapReduce programming model was developed and designed for Google File System to efficiently process large distributed datasets. The open source implementation of the Google project was called Apache Hadoop. Hadoop architecture comprises of Hadoop Distributed File System (HDFS) and Hadoop MapReduce. HDFS provides support to Hadoop for effectively managing large datasets over the cluster and MapReduce helps in efficient large-scale distributed datasets processing. MapReduce incorporates strategies to re-executes speculative task on some other node in order to finish computation quickly, enhancing the overall Quality of Service (QoS). Several mechanisms were suggested over default Hadoop’s Scheduler, such as Longest Approximate Time to End (LATE), Self-Adaptive MapReduce scheduler (SAMR) and Enhanced Self-Adaptive MapReduce scheduler (ESAMR), to improve speculative re-execution of tasks over the cluster. Objective: The aim of this research is to develop an efficient speculative task detection mechanism to improve the overall QoS offered over Hadoop cluster. Methods: Our studies suggest the importance of keeping a regular track of node’s performance in order to re-execute speculative tasks more efficiently. Results: We have successfully reduced the detection time of speculative tasks (~ 15%) and improved accuracy of correct speculative task detection (~10%) as compared to existing mechanisms. Conclusion: This paper presents an efficient speculative task detection algorithm for MapReduce schedulers to improve the QoS offered by Hadoop clusters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call