Abstract

The MapReduce programming model was designed and developed for Google File System to efficiently process large-scale distributed data sets. The open source implementation of this Google project was called the Apache Hadoop. Hadoop architecture includes Hadoop MapReduce and Hadoop Distributed File System (HDFS). HDFS supports Hadoop in effectively managing data sets over the cluster and MapReduce programming paradigm helps in the efficient processing of large data sets. MapReduce strategically re-executes a speculative task on some other node to finish the computation quickly, enhancing the overall Quality of Service (QoS). Several mechanisms were suggested over the Hadoop's Default Scheduler to improve the speculative task execution over Hadoop cluster. A large number of strategies were also suggested for scheduling jobs with deadlines. The mechanisms for speculative task execution were not developed for or were not well integrated with Deadline Schedulers. This article presents an improved speculative task detection algorithm, designed specifically for Deadline Scheduler. Our studies suggest the importance of keeping a regular track of node's performance to re-execute the speculative tasks more efficiently. We have successfully improved the QoS offered by Hadoop clusters over the jobs arriving with deadlines in terms of the percentage of successfully completed jobs, the detection time of speculative tasks, the accuracy of correct speculative task detection, and the percentage of incorrectly fagged speculative tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call